ONTAP Hardware

Aggregate aggr1: Failed due to multi-disk error - Urgent help

ashwin
12,180 Views

Thre are 3 disk failures, please advise.

Please find the errors below:

filer02: monitor.shutdown.brokenDisk.pending:warning]: the parity disk and a data disk in RAID group /aggr1/plex0/rg0 are broken. Halting system in 15 hours.

filer02: raid.config.filesystem.disk.failed:error]: File system Disk /aggr1/plex0/rg0/0a.32 Shelf 2 Bay 0 [NETAPP   X269_HJUPI01TSSX NA01] S/N [N00V103L] failed.

[filer02: disk.failmsg:error]: Disk 0a.32 (N00V103L): message received, maintenance center recommended.
Fri Jul 7 20:42:36 BNT [filer02: raid.disk.unload.done:info]: Unload of Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] has completed successfully
Fri Jul 7 20:42:46 BNT [filer02/filer01: iscsi.service.startup:info]: iSCSI service startup
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.35 Shelf 2 Bay 3 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00WLAUL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V103L] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.33 Shelf 2 Bay 1 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00W75PL] successfully deleted from spare pool
Fri Jul 7 20:42:47 BNT [filer02/filer01: raid.fdr.failed.ok:info]: Disk 0a.40 Shelf 2 Bay 8 [NETAPP X269_HJUPI01TSSX NA01] S/N [N00V0W8L] successfully deleted from spare pool

10 REPLIES 10

NAYABRSK
12,094 Views

Hi Ashwin, 

 

As this point you need to get the disks replaced to avoid your controller shutodwn as i see your controller is already running in DEGRADE MODE ( NetApp goes to degrade mode if more than 2 disk failure)

 

Could you help to post output of commands 

 

vol status -f and vol status -s

 

If this is a cluster mode controller then use command 

 

storage show disk -broken

 

storage show disk -spare

 

 

Thanks,

Nayab

ashwin
12,013 Views

Hi Nayab,

 

Thanks for your reply.

 

Please find the attached files.

 

Please let us know if the data is lost as we cannot access the volume from the Server.

NAYABRSK
12,073 Views

Hi Ashwin, 

 

 

I see from the logs that takeover failed due to no enough spares, I suggest to replace the failed disk and power on the system again it should come up.

 

 

Thanks,

Nayab

ashwin
12,004 Views

Hi Nayab,

 

Should we replace all the 4 Disks and power up.

 

Will there be no data lost.

ashwin
12,006 Views

Hi Nayab,

 

Do you suggest to replace all 4 failed disks in Filer 2.

Also there are 2 failed disks in fILER 1.

Please find the attached.

ashwin
12,002 Views

Hi Nayab,

 

Please find attached output of commands:

 

vol status -f 

 

vol status -s 

NAYABRSK
11,942 Views

Hi Ashwin, 

 

 

I can see one spare disk available in the spare pool, but i would suggest replacing all the failed disk and power on the controllers.

 

 

Thanks,

Nayab

ashwin
11,940 Views

Hi Nayab,

 

Thanks for the advice.

However we are unable to access Filer 02 in which there is 4 disks failed.

Will the date in Filer 02 be lost. Please advise

NAYABRSK
11,925 Views

Hi Ashwin, 

 

 

At this point i cannot confirm that because if all the disks are data disks then you can still rebuild your data from parity. But imagine if two parity disks failed the definetly there will be data loss. That is the reason why i have asked to replace the disks asap and power on the controller and allow Ontap to rebuild all the failed disks, If all rebuilt then you can still have your data.

 

 

Thanks,

Nayab

ashwin
9,975 Views

Hi Nayab,

 

Thanks for the response.

 

Will replace the disks and update you

 

Regards,

Ashwin

Public