failed aggregate

sdbxfr · ‎2022-10-10

Hello

Is there a way to recover this aggregate (power outage during rebuild...)?

AFF8040::*> storage aggregate show -r -aggregate data 

Owner Node: AFF8040-01
 Aggregate: data (failed, raid_dp, partial) (block checksums)
  Plex: /data/plex0 (offline, failed, inactive)
   RAID Group /data/plex0/rg0 (partial, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  3.52.0                       0   SSD        -   3.49TB   3.49TB (normal)
     parity   3.52.1                       0   SSD        -   3.49TB   3.49TB (normal)
     data     3.52.2                       0   SSD        -   3.49TB   3.49TB (normal)
     data     3.52.3                       0   SSD        -   3.49TB   3.49TB (normal)
     data     FAILED                       -   -          -   3.49TB       0B (failed)
     data     3.52.17                      0   SSD        -   3.49TB   3.49TB (normal)
     data     FAILED                       -   -          -   3.49TB       0B (failed)
     data     3.52.5                       0   SSD        -   3.49TB   3.49TB (reconstruction stalled)
     data     3.52.14                      0   SSD        -   3.49TB   3.49TB (normal)
     data     3.52.15                      0   SSD        -   3.49TB   3.49TB (normal)
     data     3.52.16                      0   SSD        -   3.49TB   3.49TB (normal)
     data     3.53.0                       0   SSD        -   3.49TB   3.49TB (normal)
     data     3.53.3                       0   SSD        -   3.49TB   3.49TB (normal)
     data     3.53.1                       0   SSD        -   3.49TB   3.49TB (normal)

  Unassimilated data disks
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     orphan   3.52.4                       0   SSD        -   3.49TB   3.49TB (normal)
     orphan   3.52.21                      0   SSD        -   3.49TB   3.49TB (normal)
16 entries were displayed.

AFF8040::*> storage disk show  -spare
Original Owner: AFF8040-01                                            
  Checksum Compatibility: block
                                                                         Usable Physical
    Disk            HA Shelf Bay Chan   Pool   Type Class          RPM     Size     Size Owner
    --------------- ------------ ---- ------ ------ ----------- ------ -------- -------- --------
    3.52.12         0a    52  12    A  Pool0    SSD solid-state      -   3.49TB   3.49TB AFF8040-01
    3.52.13         0a    52  13    A  Pool0    SSD solid-state      -   3.49TB   3.49TB AFF8040-01
    3.52.20         0a    52  20    A  Pool0    SSD solid-state      -  186.1GB  186.3GB AFF8040-01
    3.53.4          3a    53   4    B  Pool0    SSD solid-state      -  186.1GB  186.3GB AFF8040-01

Orphans disks should be in the 'data' aggregate ...

Any help would be appreciate.

Ontapforrum · ‎2022-10-10

I am hoping you must have already raised a ticket and in contact with NetApp by now. If not done yet, then Please contact NetApp Technical Support.

When multiple disks fail one after each other, causing the RAID group to go offline, and if the first failed disk is unfailed manually, it will become an orphaned disk. An orphaned disk still contains data and if the RAID group had a multiple drive failure, the orphaned disk may help to recover the failed RAID group.

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/What_is_an_orphaned_disk%3F

sdbxfr · ‎2022-10-11

Hello

Thank you, this system has no support so I need to find a solution by myself... I already found this link and it this seems to be the only result available about "orphan disk".

Thanks

Ontapforrum · ‎2022-10-11

What is the current state of this output:

cluster::> node run -node <Node> -command sysconfig -r
cluster::> event log show

Ideally, you would want NetApp to handle this situation. Anyway, If you cannot recover the aggr, then you have to simply - "Restore the data from the backups" (Hopefully, this is in place).

Have a look at this thread.
https://www.reddit.com/r/netapp/comments/ki4ix7/failure_of_3_disks_in_the_same_raidgroup/