Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Thre are 3 disk failures, please advise.
Please find the errors below:
filer02: monitor.shutdown.brokenDisk.pending:warning]: the parity disk and a data disk in RAID group /aggr1/plex0/rg0 are broken. Halting system in 15 hours.
filer02: raid.config.filesystem.disk.failed:error]: File system Disk /aggr1/plex0/rg0/0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] failed.
[filer02: disk.failmsg:error]: Disk 0a.32 (N00V103L): message received, maintenance center recommended.
Fri Jul 7 20:42:36 BNT [filer02: raid.disk.unload.done:info]: Unload of Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] has completed successfully
Fri Jul 7 20:42:46 BNT [filer02/filer01: iscsi.service.startup:info]: iSCSI service startup
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.35 Shelf 2 Bay 3 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00WLAUL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.33 Shelf 2 Bay 1 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00W75PL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.40 Shelf 2 Bay 8 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V0W8L] successfully deleted from spare pool
10 REPLIES 10
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Ashwin,
As this point you need to get the disks replaced to avoid your controller shutodwn as i see your controller is already running in DEGRADE MODE ( NetApp goes to degrade mode if more than 2 disk failure)
Could you help to post output of commands
vol status -f and vol status -s ?
If this is a cluster mode controller then use command
storage show disk -broken
storage show disk -spare
Thanks,
Nayab
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Nayab,
Thanks for your reply.
Please find the attached files.
Please let us know if the data is lost as we cannot access the volume from the Server.
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Ashwin,
I see from the logs that takeover failed due to no enough spares, I suggest to replace the failed disk and power on the system again it should come up.
Thanks,
Nayab
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Nayab,
Should we replace all the 4 Disks and power up.
Will there be no data lost.
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Nayab,
Do you suggest to replace all 4 failed disks in Filer 2.
Also there are 2 failed disks in fILER 1.
Please find the attached.
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Nayab,
Please find attached output of commands:
vol status -f
vol status -s
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Ashwin,
I can see one spare disk available in the spare pool, but i would suggest replacing all the failed disk and power on the controllers.
Thanks,
Nayab
Highlighted
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Nayab,
Thanks for the advice.
However we are unable to access Filer 02 in which there is 4 disks failed.
Will the date in Filer 02 be lost. Please advise
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Ashwin,
At this point i cannot confirm that because if all the disks are data disks then you can still rebuild your data from parity. But imagine if two parity disks failed the definetly there will be data loss. That is the reason why i have asked to replace the disks asap and power on the controller and allow Ontap to rebuild all the failed disks, If all rebuilt then you can still have your data.
Thanks,
Nayab