ONTAP Discussions

Aggregate has failed and cannot be brought online. Raid group is missing 1 disk.

MT_Back_Office
18,637 Views

Greetings, everyone. Any assistance, insight, or advice provided in order to bring a failed aggregrate online would be greatly appreciated. We recently encountered multiple disk failures and all have been replaced. However, it looks as though a media scrubbing and/or zero-ing process on the spare replacement disks must complete before the failed aggregate will consume a spare and reconstruct the partial raid group - is there any truth to this assumption? Some relevant details are outlined, below. Thank you in advance...

 

NetApp Release 7.3.2P6

 

FilerView > Aggregates > Manage displays the following status for aggr3: (failed, raid_dp, partial)

 

Attempting to place the aggregate online displays the following error: Requested operation failed on aggregate 'aggr3': Aggregate 'aggr3' has failed and cannot be brought online.

 

From a command line, aggr status -r displays:

 

Aggregate aggr3 (failed, raid_dp, partial) (block checksums)
  Plex /aggr3/plex0 (offline, failed, inactive)
    RAID group /aggr3/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0a.89   0a    5   9   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      parity    0a.75   0a    4   11  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.58   0a    3   10  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.86   0a    5   6   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.48   0a    3   0   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.54   0a    3   6   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.55   0a    3   7   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.81   0a    5   1   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.87   0a    5   7   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.93   0a    5   13  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.90   0a    5   10  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.82   0a    5   2   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.84   0a    5   4   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.92   0a    5   12  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.64   0a    4   0   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.85   0a    5   5   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304

    RAID group /aggr3/plex0/rg1 (partial)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   FAILED          N/A                   635555/1301618176
      parity    0a.49   0a    3   1   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.18   0a    1   2   FC:A   -  ATA   7200 274400/561971200  274540/562258784
      data      0a.45   0a    2   13  FC:A   -  ATA   7200 423111/866531584  423889/868126304
      data      0a.80   0a    5   0   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.66   0a    4   2   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.88   0a    5   8   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
      data      0a.83   0a    5   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.67   0a    4   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.51   0a    3   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.91   0a    5   11  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
      data      0a.52   0a    3   4   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      Raid group is missing 1 disk.

 

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.65   0a    4   1   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
spare           0a.68   0a    4   4   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
spare           0a.76   0a    4   12  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304

 

aggr media_scrub status displays:

 

aggr media_scrub /aggr1/plex0/rg0 is 20% complete
aggr media_scrub /aggr2/plex0/rg0 is 12% complete
aggr media_scrub /aggr0/plex0/rg0 is 31% complete
aggr media_scrub 0a.65 is 42% complete
aggr media_scrub 0a.76 is 42% complete
aggr media_scrub 0a.68 is 42% complete

 

Once the media_scrub completes, will the failed aggregate consume a spare disk and reconstruct the partial raid group?

 

8 REPLIES 8

AGUMADAVALLI
18,623 Views

Get to the My Autosupport site, locate the aggregate and disks associated. Find the missing drive and its id and add it back to the aggregate.

 

You should get back the aggregate online. If not call the support.

 

thank you

aKG

hariprak
18,604 Views

Hi,

 

Hope this article helps https://kb.netapp.com/support/index?page=content&id=2015763&actp=LIST_RECENT&viewlocale=en_US&searchid=1416288732535

 

Thanks

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

aborzenkov
18,598 Views

You have triple disk failure. Your only option is to try to bring disk that failed last online and hope it will allow reconstruction to complete. Do not attempt to unfail disk from Data ONTAP. It will make it spare and unsuitable for reconstruction. Open case with NetApp and let them guide you. It is very easy to lose data in this situation.

 

JGPSHNTAP
18,586 Views

^^

 

What he said.

 

I've had a triple disk failure and it doesn't end well.. Hope you have a good snapmirror copy.   I would call support asap,and they probably will have you do a wafl iron, but you need to call support for this issue

MT_Back_Office
18,578 Views

Thank all of you for your responses, links, and suggestions.

 

Unfortunately, our Support Agreement/Warranty expired and was not renewed. Ultimately, we executed the steps listed below:

 

-SnapMirror > Manage > deleted all entries that were directed at the failed offline aggregate and in an unknown state.

 

-Destroyed failed aggr3

 

-Added new aggregate (disks initialized and zeroed)

 

-Created and configured new volumes on the new aggregate

 

-Created new SnapMirror entries for the new volumes

 

-Restricted the new volumes and initialized the SnapMirrors

 

All SnapMirrors are now either transferring or in a snapmirrored state.

Aficionado
16,664 Views

^^^^

 

You have triple disk failure. Your only option is to try to bring disk that failed last online and hope it will allow reconstruction to complete. Do not attempt to unfail disk from Data ONTAP. It will make it spare and unsuitable for reconstruction. Open case with NetApp and let them guide you. It is very easy to lose data in this situation.

 

===

 

Sorry but I cannot see the triple failure and that worries me, all i can see is a single dsik failed in this aggregate:

 

RAID group /aggr3/plex0/rg1 (partial)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   FAILED          N/A                   635555/1301618176
      parity    0a.49   0a    3   1   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.18   0a    1   2   FC:A   -  ATA   7200 274400/561971200  274540/562258784
      data      0a.45   0a    2   13  FC:A   -  ATA   7200 423111/866531584  423889/868126304
      data      0a.80   0a    5   0   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.66   0a    4   2   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.88   0a    5   8   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
      data      0a.83   0a    5   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.67   0a    4   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.51   0a    3   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      data      0a.91   0a    5   11  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304 (reconstruction 99% completed)
      data      0a.52   0a    3   4   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
      Raid group is missing 1 disk.

 

Where can you see the other 2 failed disks???

 

Much appreciated!

aborzenkov
16,661 Views

Where can you see the other 2 failed disks???


Those that are currently Reconstructing.

SCOTT_V_RONTALE
13,688 Views

has this issue been solved?

 

Public