ONTAP Discussions
ONTAP Discussions
Just replaced a drive, but one of our aggregates is still showing failed disks. How can we get the status back to normal? We have plenty of spares.
RAID Group /aggr2_sas_clp_lcl_fas8020b/plex0/rg1 (double degraded, block checksums, raid_dp) Usable Physical Position Disk Pool Type RPM Size Size Status -------- --------------------------- ---- ----- ------ -------- -------- ---------- dparity 3.33.7 0 SAS 15000 546.9GB 547.7GB (normal) parity 3.32.8 0 SAS 15000 546.9GB 547.1GB (normal) data 3.33.8 0 SAS 15000 546.9GB 547.7GB (normal) data 3.32.9 0 SAS 15000 546.9GB 547.1GB (normal) data 3.33.9 0 SAS 15000 546.9GB 547.7GB (normal) data FAILED - - - 546.9GB - (failed) data 3.33.10 0 SAS 15000 546.9GB 547.7GB (normal) data 3.32.11 0 SAS 15000 546.9GB 547.1GB (normal) data 3.33.11 0 SAS 15000 546.9GB 547.7GB (normal) data FAILED - - - 546.9GB - (failed) data 3.33.12 0 SAS 15000 546.9GB 547.7GB (normal) data 3.32.13 0 SAS 15000 546.9GB 547.1GB (normal) data 3.33.13 0 SAS 15000 546.9GB 547.7GB (normal) data 3.32.14 0 SAS 15000 546.9GB 547.1GB (normal) data 3.33.14 0 SAS 15000 546.9GB 547.7GB (normal)
Pool0 Spare Pool Usable Physical Disk Type Class RPM Checksum Size Size Status ---------------- ------ ----------- ------ -------------- -------- -------- -------- 2.22.17 SAS performance 10000 block 836.9GB 838.4GB zeroed 2.22.19 SAS performance 10000 block 836.9GB 838.4GB zeroed 2.23.9 SAS performance 10000 block 836.9GB 838.4GB zeroed 3.30.22 SAS performance 15000 block 546.9GB 547.1GB zeroed 3.31.2 SAS performance 15000 block 546.9GB 547.7GB zeroed 3.32.12 SAS performance 15000 block 546.9GB 547.7GB zeroed Original Owner: clp-lcl-fas8020b Pool0 Spare Pool Usable Physical Disk Type Class RPM Checksum Size Size Status ---------------- ------ ----------- ------ -------------- -------- -------- -------- 2.20.18 SAS performance 10000 block 836.9GB 838.4GB zeroed 2.20.23 SAS performance 10000 block 836.9GB 838.4GB zeroed 2.21.17 SAS performance 10000 block 836.9GB 838.4GB zeroed 3.32.5 SAS performance 15000 block 546.9GB 547.1GB zeroed 3.32.7 SAS performance 15000 block 546.9GB 547.1GB zeroed 3.32.10 SAS performance 15000 block 546.9GB 547.7GB zeroed 3.33.23 SAS performance 15000 block 546.9GB 547.7GB zeroed 1.10.9 SSD solid-state - block 186.1GB 186.3GB zeroed 14 entries were displayed.
Have you tried manual unfailing the disk ? The storage disk unfail command can be used to unfail it.
Following command (in advanced privilage level) shall help you to unfail and make it spare disk.
cluster1::*> storage disk unfail -disk <disk path name> - Disk Name -s true
After the case was escalated with Netapp, this was resolved.
Ended up being locks preventing giveback.
storage failover show-giveback Partner Node Aggregate Giveback Status -------------- ----------------- --------------------------------------------- <node> CFO Aggregates Done aggr2_sas_fas8020b Failed: Operation was vetoed by lock_manager. Giveback vetoed: Giveback cannot proceed because non-continuously available (non-CA) CIFS locks are present on the volume. Gracefully close the CIFS sessions over which non-CA locks are established. Use the "vserver cifs session file show -hosting-aggregate <aggregate list> -continuously-available No" command to view the open files that have CIFS sessions with non-CA locks established. <aggregate list> is the list of aggregates sent home as a result of the giveback operation. If lock state disruption for all existing non-CA locks is acceptable, retry the giveback operation by specifying "-override-vetoes true". Warning: Overriding vetoes to perform a giveback can be disruptive.
Once I overrode vetoes, the aggregate started rebuilding.
storage failover giveback -ofnode <node> -override-vetoes true