ONTAP Discussions

FAILED aggregate, not/stack reconstructing

JOHN_L_COMBOCAR
6,016 Views

Hi,

 

We have an aggregate that is not reconstructing and stack on reconstructing. aggr status -r output below.

 

luneta> aggr status -r aggr10
Aggregate aggr10 (failed, raid_dp, partial) (block checksums)
Plex /aggr10/plex0 (offline, failed, inactive)
RAID group /aggr10/plex0/rg0 (partial, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.03.8 0b 3 8 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
parity 0b.02.7 0b 2 7 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.8 0b 2 8 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.9 0b 3 9 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data FAILED N/A 560000/ -
data 0b.02.11 0b 2 11 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.12 0b 2 12 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.13 0b 2 13 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.14 0b 2 14 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.15 0b 2 15 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data FAILED N/A 560000/ -
data 0b.02.17 0b 2 17 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.18 0b 2 18 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0a.02.12 0a 2 12 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.02.20 0b 2 20 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.21 0b 2 21 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0a.02.17 0a 2 17 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 (reconstruction 99% completed)
data 0b.03.0 0b 3 0 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.02.6 0b 2 6 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.02.23 0b 2 23 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.2 0b 3 2 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.3 0b 3 3 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.4 0b 3 4 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.5 0b 3 5 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.6 0b 3 6 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.03.7 0b 3 7 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
Raid group is missing 2 disks.

 

it says, 2 disk missing and a reconstruction that is stack in 99%. i tried to manually failed or replace the disk but no luck.

 

Anyone know this issue?

 

Thanks.

 

John

5 REPLIES 5

JGPSHNTAP
5,996 Views

This looks like 7-mode box with 3 failed disks in a raid-group which means you lost the aggregate.

 

You need to call Netapp support if that is the case, and they might be able to help you recover with a wafl iron.

JOHN_L_COMBOCAR
5,947 Views

Unfortunately, this machine dont have MA anymore, but the question is, why this aggregate is not reconstructing or stop on reconstructing even it have plenty of spare. i will check on wafle iron, it may help me on this issue.

JOHN_L_COMBOCAR
5,945 Views

i already check wafle iron but i think this is not applicable on this issue.

 

base on the KB the i read, this are the condition that must be met before running wafle iron.

 

  • RAID must be in an online or restricted/degraded state.
  • The WAFL file system must be mounted.
  • The file system may be wafl_inconsistent.

 

i think wafle iron can help me on this.

 

 

JGPSHNTAP
5,903 Views

You lost three disks, not two.  I doubt you lost three simulatenous disks at the same time so  perhaps failed drives weren't replaced in time, hence why you lost three and the aggregate.

 

Check your messages file and look for when the failed disks failed.

JOHN_L_COMBOCAR
5,811 Views

Spare disk was never empty on this Machine, do reconstructing of disk is 1 at a time? because i see only 1 reconstructing and it is stack at 99% IDK why.

Public