ONTAP Hardware
ONTAP Hardware
HI, dear
we have a failed hard drive but there is no “reconstructing” message in /etc/messages as usual.
And the it showed that the failed drive is also a spare disk. I never came across this before.
If it is indeed hotspare, can we only need to pull it out since we still have an alternative FC hotspare? because warranty service of this filer happened to expire last month....................
fs03*> aggr status -f
Broken disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0d.57 0d 3 9 FC:B - FCAL 15000 418000/856064000 420156/860480768
fs03*> aggr status -s
Spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare 0d.52 0d 3 4 FC:B - FCAL 15000 418000/856064000 420156/860480768
spare 3a.29 3a 1 13 FC:A - ATA 7200 423111/866531584 423889/868126304
some entries from /etc/messages:
Thu Dec 20 15:14:53 PST [fs03: raid.config.spare.disk.failed:error]: Spare Disk 0d.57 Shelf 3 Bay 9 [NETAPP X291_S15K6420F15 NA01] S/N [3QQ0E49Y00009913FZ8N] failed.
Thu Dec 20 15:14:53 PST [fs03: raid.disk.unload.done:info]: Unload of Disk 0d.57 Shelf 3 Bay 9 [NETAPP X291_S15K6420F15 NA01] S/N [3QQ0E49Y00009913FZ8N] has completed successfully
Thu Dec 20 15:14:53 PST [fs03: disk.partner.msgStatus:debug]: The partner/CDO has returned status 0 for msg sent for 0d.57.
Thu Dec 20 15:15:00 PST [fs03: monitor.globalStatus.nonCritical:warning]: Disk on adapter 0d, shelf 3, bay 9, failed.
Thu Dec 20 15:15:11 PST [fs03: asup.smtp.sent:notice]: Cluster Notification mail sent: Cluster Notification from fs03 (SPARE DISK FAILED) WARNING
Thu Dec 20 15:15:13 PST [fs03: asup.post.sent:notice]: Cluster Notification message posted to NetApp: Cluster Notification from fs03 (SPARE DISK FAILED) WARNING
anyone can help?
I was wondering why hot spare disk failed since there is no r/w to hotspare.....
Hi - disk 0d.57 has failed. ok
fs03*> aggr status -f
Broken disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0d.57 0d 3 9 FC:B - FCAL 15000 418000/856064000 420156/860480768
Check the status of your aggregates, with aggr status? are any showing as degraded at all?
If the Failed Disk hasnt been replaced by a Hot Spare, have you Zeroed the Hot Spares first, just to confirm that this has been completed?
disk zero spares.
hi, Martin
thanks for your heads up first.
the aggr are okay and no degraded so far.
seems the failed disk itself is HOT spare.
fs03> aggr status
Aggr State Status Options
aggr0 online raid_dp, aggr root
aggr1 online raid_dp, aggr raidsize=16
Try taking the Hot Spare Disk out, ie pull that disk and then re-insert/re-seat the disk.
Alternatively set the Filer to Priv Set advanced mode and manually unfail the disk, then try and zero the spare again. if the disk fails again, then it is more than likely at the end of its service life and you will need to get another Hot Spare replacement.
Martin
Hi
Yes a spare disk can fail, as spare disk runs like other disk
so mechanical or electrical problem are possible
If you look at the shelf, do you have a disk with orange led ?
You can just replace the faulty drive if it is
alternative option will be (if no error led found) , try to unfail the disk from the console (you need to be in advanced mode)