Subscribe

about "(SPARE DISK FAILED) WARNING" (FAS3140)

HI, dear

we have a failed hard drive but there is no “reconstructing” message in /etc/messages as usual.

And the it showed that the failed drive is also a spare disk.  I never came across this before.

If it is indeed hotspare, can we only need to pull it out since we still have an alternative FC hotspare? because warranty   service of this filer happened to expire last month....................

fs03*> aggr status -f     
Broken disks
RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
failed          0d.57   0d    3   9   FC:B   -  FCAL 15000 418000/856064000  420156/860480768

fs03*> aggr status -s
Spare disks
RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0d.52   0d    3   4   FC:B   -  FCAL 15000 418000/856064000  420156/860480768
spare           3a.29   3a    1   13  FC:A   -  ATA   7200 423111/866531584  423889/868126304

some entries from /etc/messages:
Thu Dec 20 15:14:53 PST [fs03: raid.config.spare.disk.failed:error]: Spare Disk 0d.57 Shelf 3 Bay 9 [NETAPP   X291_S15K6420F15 NA01] S/N [3QQ0E49Y00009913FZ8N] failed.
Thu Dec 20 15:14:53 PST [fs03: raid.disk.unload.done:info]: Unload of Disk 0d.57 Shelf 3 Bay 9 [NETAPP   X291_S15K6420F15 NA01] S/N [3QQ0E49Y00009913FZ8N] has completed successfully
Thu Dec 20 15:14:53 PST [fs03: disk.partner.msgStatus:debug]: The partner/CDO has returned status 0 for msg sent for 0d.57.
Thu Dec 20 15:15:00 PST [fs03: monitor.globalStatus.nonCritical:warning]: Disk on adapter 0d, shelf 3, bay 9, failed. 
Thu Dec 20 15:15:11 PST [fs03: asup.smtp.sent:notice]: Cluster Notification mail sent: Cluster Notification from fs03 (SPARE DISK FAILED) WARNING
Thu Dec 20 15:15:13 PST [fs03: asup.post.sent:notice]: Cluster Notification message posted to NetApp: Cluster Notification from fs03 (SPARE DISK FAILED) WARNING

Re: about "(SPARE DISK FAILED) WARNING" (FAS3140)

anyone can help?

I was wondering why hot spare disk failed since there is no r/w to hotspare.....

Re: about "(SPARE DISK FAILED) WARNING" (FAS3140)

Hi - disk 0d.57 has failed. ok

fs03*> aggr status -f     
Broken disks
RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
failed          0d.57   0d    3   9   FC:B   -  FCAL 15000 418000/856064000  420156/860480768

Check the status of your aggregates, with aggr status? are any showing as degraded at all?

If the Failed Disk hasnt been replaced by a Hot Spare, have you Zeroed the Hot Spares first, just to confirm that this has been completed?

disk zero spares.

Re: about "(SPARE DISK FAILED) WARNING" (FAS3140)

Hi

Yes a spare disk can fail, as spare disk runs like other disk

so mechanical or electrical problem are possible

If you look at the shelf, do you have a disk with orange led  ?

You can just replace the faulty drive if it is

alternative option will be (if no error led found) , try to unfail the disk from the console (you need to be in advanced mode)

Re: about "(SPARE DISK FAILED) WARNING" (FAS3140)

hi, Martin

thanks for your heads up first.

the aggr are okay and no degraded so far.

seems the failed disk itself is HOT spare.

fs03> aggr status

           Aggr State           Status            Options

          aggr0 online          raid_dp, aggr     root

          aggr1 online          raid_dp, aggr     raidsize=16

Re: about "(SPARE DISK FAILED) WARNING" (FAS3140)

Try taking the Hot Spare Disk out, ie pull that disk and then re-insert/re-seat the disk.

Alternatively set the Filer to Priv Set advanced mode and manually unfail the disk, then try and zero the spare again. if the disk fails again, then it is more than likely at the end of its service life and you will need to get another Hot Spare replacement.

Martin