ONTAP Discussions
ONTAP Discussions
Greetings, everyone. Any assistance, insight, or advice provided in order to bring a failed aggregrate online would be greatly appreciated. We recently encountered multiple disk failures and all have been replaced. However, it looks as though a media scrubbing and/or zero-ing process on the spare replacement disks must complete before the failed aggregate will consume a spare and reconstruct the partial raid group - is there any truth to this assumption? Some relevant details are outlined, below. Thank you in advance...
NetApp Release 7.3.2P6
FilerView > Aggregates > Manage displays the following status for aggr3: (failed, raid_dp, partial)
Attempting to place the aggregate online displays the following error: Requested operation failed on aggregate 'aggr3': Aggregate 'aggr3' has failed and cannot be brought online.
From a command line, aggr status -r displays:
Aggregate aggr3 (failed, raid_dp, partial) (block checksums)
Plex /aggr3/plex0 (offline, failed, inactive)
RAID group /aggr3/plex0/rg0 (normal)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.89 0a 5 9 FC:A - ATA 7200 635555/1301618176 635858/1302238304
parity 0a.75 0a 4 11 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.58 0a 3 10 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.86 0a 5 6 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.48 0a 3 0 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.54 0a 3 6 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.55 0a 3 7 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.81 0a 5 1 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.87 0a 5 7 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.93 0a 5 13 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.90 0a 5 10 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.82 0a 5 2 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.84 0a 5 4 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.92 0a 5 12 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.64 0a 4 0 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.85 0a 5 5 FC:A - ATA 7200 635555/1301618176 635858/1302238304
RAID group /aggr3/plex0/rg1 (partial)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity FAILED N/A 635555/1301618176
parity 0a.49 0a 3 1 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.18 0a 1 2 FC:A - ATA 7200 274400/561971200 274540/562258784
data 0a.45 0a 2 13 FC:A - ATA 7200 423111/866531584 423889/868126304
data 0a.80 0a 5 0 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.66 0a 4 2 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.88 0a 5 8 FC:A - ATA 7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
data 0a.83 0a 5 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.67 0a 4 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.51 0a 3 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.91 0a 5 11 FC:A - ATA 7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
data 0a.52 0a 3 4 FC:A - ATA 7200 635555/1301618176 635858/1302238304
Raid group is missing 1 disk.
Spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare 0a.65 0a 4 1 FC:A - ATA 7200 635555/1301618176 635858/1302238304
spare 0a.68 0a 4 4 FC:A - ATA 7200 635555/1301618176 635858/1302238304
spare 0a.76 0a 4 12 FC:A - ATA 7200 635555/1301618176 635858/1302238304
aggr media_scrub status displays:
aggr media_scrub /aggr1/plex0/rg0 is 20% complete
aggr media_scrub /aggr2/plex0/rg0 is 12% complete
aggr media_scrub /aggr0/plex0/rg0 is 31% complete
aggr media_scrub 0a.65 is 42% complete
aggr media_scrub 0a.76 is 42% complete
aggr media_scrub 0a.68 is 42% complete
Once the media_scrub completes, will the failed aggregate consume a spare disk and reconstruct the partial raid group?
Get to the My Autosupport site, locate the aggregate and disks associated. Find the missing drive and its id and add it back to the aggregate.
You should get back the aggregate online. If not call the support.
thank you
aKG
Hi,
Hope this article helps https://kb.netapp.com/support/index?page=content&id=2015763&actp=LIST_RECENT&viewlocale=en_US&searchid=1416288732535
Thanks
You have triple disk failure. Your only option is to try to bring disk that failed last online and hope it will allow reconstruction to complete. Do not attempt to unfail disk from Data ONTAP. It will make it spare and unsuitable for reconstruction. Open case with NetApp and let them guide you. It is very easy to lose data in this situation.
^^
What he said.
I've had a triple disk failure and it doesn't end well.. Hope you have a good snapmirror copy. I would call support asap,and they probably will have you do a wafl iron, but you need to call support for this issue
Thank all of you for your responses, links, and suggestions.
Unfortunately, our Support Agreement/Warranty expired and was not renewed. Ultimately, we executed the steps listed below:
-SnapMirror > Manage > deleted all entries that were directed at the failed offline aggregate and in an unknown state.
-Destroyed failed aggr3
-Added new aggregate (disks initialized and zeroed)
-Created and configured new volumes on the new aggregate
-Created new SnapMirror entries for the new volumes
-Restricted the new volumes and initialized the SnapMirrors
All SnapMirrors are now either transferring or in a snapmirrored state.
^^^^
You have triple disk failure. Your only option is to try to bring disk that failed last online and hope it will allow reconstruction to complete. Do not attempt to unfail disk from Data ONTAP. It will make it spare and unsuitable for reconstruction. Open case with NetApp and let them guide you. It is very easy to lose data in this situation.
===
Sorry but I cannot see the triple failure and that worries me, all i can see is a single dsik failed in this aggregate:
RAID group /aggr3/plex0/rg1 (partial)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity FAILED N/A 635555/1301618176
parity 0a.49 0a 3 1 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.18 0a 1 2 FC:A - ATA 7200 274400/561971200 274540/562258784
data 0a.45 0a 2 13 FC:A - ATA 7200 423111/866531584 423889/868126304
data 0a.80 0a 5 0 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.66 0a 4 2 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.88 0a 5 8 FC:A - ATA 7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
data 0a.83 0a 5 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.67 0a 4 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.51 0a 3 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304
data 0a.91 0a 5 11 FC:A - ATA 7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
data 0a.52 0a 3 4 FC:A - ATA 7200 635555/1301618176 635858/1302238304
Raid group is missing 1 disk.
Where can you see the other 2 failed disks???
Much appreciated!
Where can you see the other 2 failed disks???
Those that are currently Reconstructing.
has this issue been solved?