Subscribe

Aggregate aggr1: Failed due to multi-disk error - Urgent help

[ Edited ]

Thre are 3 disk failures, please advise.

Please find the errors below:

filer02: monitor.shutdown.brokenDisk.pending:warning]: the parity disk and a data disk in RAID group /aggr1/plex0/rg0 are broken. Halting system in 15 hours.

filer02: raid.config.filesystem.disk.failed:error]: File system Disk /aggr1/plex0/rg0/0a.32 Shelf 2 Bay 0 [NETAPP   X269_HJUPI01TSSX NA01] S/N [N00V103L] failed.

[filer02: disk.failmsg:error]: Disk 0a.32 (N00V103L): message received, maintenance center recommended.
Fri Jul 7 20:42:36 BNT [filer02: raid.disk.unload.done:info]: Unload of Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] has completed successfully
Fri Jul 7 20:42:46 BNT [filer02/filer01: iscsi.service.startup:info]: iSCSI service startup
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.35 Shelf 2 Bay 3 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00WLAUL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.33 Shelf 2 Bay 1 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00W75PL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.40 Shelf 2 Bay 8 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V0W8L] successfully deleted from spare pool

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Ashwin, 

 

As this point you need to get the disks replaced to avoid your controller shutodwn as i see your controller is already running in DEGRADE MODE ( NetApp goes to degrade mode if more than 2 disk failure)

 

Could you help to post output of commands 

 

vol status -f and vol status -s

 

If this is a cluster mode controller then use command 

 

storage show disk -broken

 

storage show disk -spare

 

 

Thanks,

Nayab

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Nayab,

 

Thanks for your reply.

 

Please find the attached files.

 

Please let us know if the data is lost as we cannot access the volume from the Server.

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Ashwin, 

 

 

I see from the logs that takeover failed due to no enough spares, I suggest to replace the failed disk and power on the system again it should come up.

 

 

Thanks,

Nayab

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Nayab,

 

Should we replace all the 4 Disks and power up.

 

Will there be no data lost.

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Nayab,

 

Do you suggest to replace all 4 failed disks in Filer 2.

Also there are 2 failed disks in fILER 1.

Please find the attached.

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Nayab,

 

Please find attached output of commands:

 

vol status -f 

 

vol status -s 

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Ashwin, 

 

 

I can see one spare disk available in the spare pool, but i would suggest replacing all the failed disk and power on the controllers.

 

 

Thanks,

Nayab

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Nayab,

 

Thanks for the advice.

However we are unable to access Filer 02 in which there is 4 disks failed.

Will the date in Filer 02 be lost. Please advise

Re: Aggregate aggr1: Failed due to multi-disk error - Urgent help

Hi Ashwin, 

 

 

At this point i cannot confirm that because if all the disks are data disks then you can still rebuild your data from parity. But imagine if two parity disks failed the definetly there will be data loss. That is the reason why i have asked to replace the disks asap and power on the controller and allow Ontap to rebuild all the failed disks, If all rebuilt then you can still have your data.

 

 

Thanks,

Nayab