Several disks failed FAS3140

CristianB · ‎2019-02-04

Hi All,

we have a problem with the following system array:

NetApp Release 8.1.2P4 7-Mode: Fri Apr 26 19:57:25 PDT 2013
Model Name: FAS3140

Curretly, there are 4 disks failed:

7) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=N03479DL, uniqueid=08aa7269-6efd95ce-da7af66b-d37a2c34
timefailed=1548853439 (30Jan2019 14:03:59), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.34
😎 vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=N12V1VZL, uniqueid=439e9052-770044dd-3891c1dd-2fa23231
timefailed=1548854040 (30Jan2019 14:14:00), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.44
9) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=J80PGUBL, uniqueid=a2206950-31e01634-c597a1e9-e7309e7c
timefailed=1548855335 (30Jan2019 14:35:35), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.42
10) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=J80PH3UL, uniqueid=28bc6823-93a74d8f-5464c40a-b3f47c4d
timefailed=1548855658 (30Jan2019 14:40:58), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.32

RAID Disk   Device   HA SHELF BAY CHAN Pool Type RPM Used (MB/blks)    Phys (MB/blks)
---------   ------   ------------- ---- ---- ---- ----- --------------    --------------
failed     0c.32    0c    2   0   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
failed     0c.34    0c    2   2   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
failed     0c.42    0c    2   10 FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
failed     1a.44    1a    2   12 FC:A   -   ATA 7200 847555/1735794176 847827/1736350304

Note.- All disks failed belong to Shelf1.

We tried to replace the disk 0c.32 applying the following commands:

disk unfail -s 1a.32
disk zero spares

But it seems failed:

Fri Feb 1 13:12:01 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:19:58 CET [netapp1:raid.assim.disk.nolabels:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has no valid labels. It will be taken out of service to prevent possible data loss.
Fri Feb 1 13:19:58 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:19:58 CET [netapp1:callhome.dsk.label:CRITICAL]: Call home for DISK BAD LABEL
Fri Feb 1 13:25:11 CET [netapp1:raid.assim.disk.nolabels:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has no valid labels. It will be taken out of service to prevent possible data loss.
Fri Feb 1 13:25:11 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.

Fri Feb 1 13:25:11 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:25:32 CET [netapp1:raid.disk.unfail.done:info]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] unfailed, and is now a spare
Fri Feb 1 13:25:47 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Fri Feb 1 13:25:47 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Fri Feb 1 13:25:48 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 1a.32

Fri Feb 1 13:28:58 CET [netapp1:scsi.path.excessiveErrors:error]: Excessive errors encountered by adapter 1a on disk device 1a.32.

Fri Feb 1 13:43:33 CET [netapp1:shm.threshold.highIOLatency:error]: Disk 1a.32 exceeds the average IO latency threshold and will be recommended for failure.
Fri Feb 1 13:43:48 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP   X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] failed.
Fri Feb 1 13:43:48 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup

In addition, we got a lot of "loss of sync count" in the Shelf1 when we tried to replace the disk in the Shelf1:

Loop                Link Transport   Loss of   Invalid    Frame In   Frame Out
ID              Failure      error      sync       CRC       count       count
                   count      count     count     count
0c.14                  0          0      6982         3        7970       15894
0c.15                  0          0      6982         3         552        1211
0c.16                  0          0      6982         3     1250276     2194998
0c.17                  0          0      6982         3      955798     1884761
0c.18                  0          0      6982         3      150197     1175046
0c.19                  0          0      6982         3      271737     1389313
0c.20                  0          0      6982         3      120787     1167596
0c.21                  0          0      6982         3      370856     1633971
0c.22                  0          0      6982         3      371410     1628533
0c.23                  0          0      6982         3      370144     1625221
0c.24                  0          0      6982         3      505295     1836872
0c.25                  0          0      6982         3      120763     1167330
0c.26                  0          0      6982         3      255631     1364559
0c.27                  0          0      6982         3      255062     1370523
0c.28                  0          0      6982         3      256094     1373862
0c.29                  0          0      6982         3      136208     1182068
0c.30                  0          0         1         1       19998       40024
0c.31                  0          0         1         1         666        1521
0c.32                  0          0         1         1        9908       20834
0c.33                  0          0         1         1        7508       25477
0c.34                  0          0         1         1      531154       56365
0c.35                  0          0         1         1        4918       16840
0c.36                  0          0         1         1        5294       18075
0c.37                  0          0         1         1       12235       41756
0c.38                  0          0         1         1        7445       25337
0c.39                  0          0         1         1       12280       41902
0c.40                  0          0         1         1        7191       24377
0c.41                  0          0         1         1        4985       16987
0c.42                  0          0         1         1      573026       30906
0c.43                  0          0         1         1        7194       24386
0c.44                  0          0         1         1       32422       31833
0c.45                  0          0         1         1        2723        9297
0c.46                  0          0         1        20        3850        7637
0c.47                  0          0         1        20         436         886
0c.48                  0          0         1        20        3037       13010
0c.49                  0          0         1        20      108756      334244
0c.50                  0          0         1        20        1385        5969
0c.51                  0          0         1        20        2992       12900
0c.52                  0          0         1        20        2992       12900
0c.53                  0          0         1        20        3230       13852
0c.54                  0          0         1        20        6138       26390
0c.55                  0          0         1        20        5900       25438
0c.56                  0          0         1        20        1424        6061
0c.57                  0          0         1        20        4332       18599
0c.58                  0          0         1        20        4531       19459
0c.59                  0          0         1        20        3191       13760
0c.60                  0          0         1        20        4492       19367
0c.61                  0          0         1        20        3230       13852
0c.62                  0          0         7        11       12469       25009
0c.63                  0          0         7        11         483        1110
0c.64                  0          0         7        11        1385        5969
0c.65                  0          0         7        11        4492       19367
0c.66                  0          0         7        11       80574      247613
0c.67                  0          0         7        11        4293       18507
0c.68                  0          0         7        11        3031       12992
0c.69                  0          0         7        11        3031       12992
0c.70                  0          0         7        11        2992       12900
0c.71                  0          0         7        11        3031       12992
0c.72                  0          0         7        11        5812       25059
0c.73                  0          0         7        11        2992       12900
0c.74                  0          0         7        11        3191       13760
0c.75                  0          0         7        11        4293       18507
0c.76                  0          0         7        11        3031       12992
0c.77                  0          0         7        11        3167       13656
0c.78                  0          0         1         7       19777       39693
0c.79                  0          0         1         7         471        1078
0c.80                  0          0         1         7        6099       26298
0c.81                  0          0         1         7        1424        6061
0c.82                  0          0         1         7        3191       13760
0c.83                  0          0         1         7        7458       32082
0c.84                  0          0         1         7        4293       18507
0c.85                  0          0         1         7        2992       12900
0c.86                  0          0         1         7        3031       12992
0c.87                  0          0         1         7        4492       19367
0c.88                  0          0         1         7        2992       12900
0c.89                  0          0         1         7        4492       19367
0c.90                  0          0         1         7        3031       12992
0c.91                  0          0         1         7        1599        6817
0c.92                  0          0         1         7        1385        5969
0c.93                  0          0         1         7        6099       26298
0c.ha                  0          0         0         0    22715776     7008913

Loop                Link Transport   Loss of   Invalid    Frame In   Frame Out
ID              Failure      error      sync       CRC       count       count
                   count      count     count     count
1a.14                  0          0      5815         4       12504       12504
1a.15                  0          0      5815         4         679         679
1a.16                  0          0      5815         4      285279     1136892
1a.17                  0          0      5815         4      686577     1447479
1a.18                  0          0      5815         4      608908     1723524
1a.19                  0          0      5815         4      349771     1488279
1a.20                  0          0      5815         4      496669     1693145
1a.21                  0          0      5815         4      252215     1248341
1a.22                  0          0      5815         4      252981     1257138
1a.23                  0          0      5815         4      252404     1255141
1a.24                  0          0      5815         4      119328     1055650
1a.25                  0          0      5815         4      496661     1694509
1a.26                  0          0      5815         4      365019     1508160
1a.27                  0          0      5815         4      363786     1499459
1a.28                  0          0      5815         4      364769     1502375
1a.29                  0          0      5815         4      481790     1680784
1a.30                  0          0         1         1         584         584
1a.31                  0          0         1         1         574         574
1a.32                  0          0         1         1       70366        4254
1a.33                  0          0         1         1        4991       13199
1a.34                  0          0         1         1       16245        2807
1a.35                  0          0         1         1        7582       19784
1a.36                  0          0         1         1        7199       18807
1a.37                  0          0         1         1         260         692
1a.38                  0          0         1         1        5050       13322
1a.39                  0          0         1         1         215         583
1a.40                  0          0         1         1        5308       14050
1a.41                  0          0         1         1        7515       19653
1a.42                  0          0         1         1      720388       14154
1a.43                  0          0         1         1        5305       14043
1a.44                  0          0         1         1      165735       18289
1a.45                  0          0         1         1        9788       25656
1a.46                  0          0         1         3       16592       16592
1a.47                  0          0         1         3         795         795
1a.48                  0          0         1         3        4549       16561
1a.49                  0          0         1         3       24672       74932
1a.50                  0          0         1         3        6195       22491
1a.51                  0          0         1         3        4594       16654
1a.52                  0          0         1         3        4588       16636
1a.53                  0          0         1         3        4350       15834
1a.54                  0          0         1         3        1442        5242
1a.55                  0          0         1         3        1680        6044
1a.56                  0          0         1         3        6156       22416
1a.57                  0          0         1         3        3248       11824
1a.58                  0          0         1         3        3049       11097
1a.59                  0          0         1         3        4389       15909
1a.60                  0          0         1         3        3088       11172
1a.61                  0          0         1         3        4350       15834
1a.62                  0          0         4         4        8003        8003
1a.63                  0          0         4         4         719         719
1a.64                  0          0         4         4        6195       22491
1a.65                  0          0         4         4        3088       11172
1a.66                  0          0         4         4       52854      160470
1a.67                  0          0         4         4        3287       11899
1a.68                  0          0         4         4        4549       16561
1a.69                  0          0         4         4        4549       16561
1a.70                  0          0         4         4        4588       16636
1a.71                  0          0         4         4        4549       16561
1a.72                  0          0         4         4        1774        6382
1a.73                  0          0         4         4        4588       16636
1a.74                  0          0         4         4        4389       15909
1a.75                  0          0         4         4        3287       11899
1a.76                  0          0         4         4        4549       16561
1a.77                  0          0         4         4        4413       15997
1a.78                  0          0         1         5         718         718
1a.79                  0          0         1         5         730         730
1a.80                  0          0         1         5        1481        5317
1a.81                  0          0         1         5        6156       22416
1a.82                  0          0         1         5        4389       15909
1a.83                  0          0         1         5         122         434
1a.84                  0          0         1         5        3287       11899
1a.85                  0          0         1         5        4588       16636
1a.86                  0          0         1         5        4549       16561
1a.87                  0          0         1         5        3088       11172
1a.88                  0          0         1         5        4588       16636
1a.89                  0          0         1         5        3088       11172
1a.90                  0          0         1         5        4549       16561
1a.91                  0          0         1         5        5981       21777
1a.92                  0          0         1         5        6195       22491
1a.93                  0          0         1         5        1481        5317
1a.ha                  1          0        20         0    21224756     6680563

So at this point we are unable to replace any failed disk, so Should be the root cause of this behaviour any component in the Loop (ESH module, FC cable ...)?
Please, let me know if you need any command output or/and more explanations about it.

Thanks in advance,

Regards.

Cristian

AlexDawson · ‎2019-02-04

Hi there!

I'm going to guess you've obtained these disks from eBay or similar secondhand market.

Unfortunately it looks like it may also be failed or faulty.

Sorry, try again would be my suggestion.

Hard drives are mechanical devices with a finite lifespan based on potential wear to internal structures - these drives are now 7-12 years old.

CristianB · ‎2019-02-07

Hi Alex,

first of all, thank you very much for your reply.

But it seems we have another different problem about it, let me explain better than the last time:

-- The system had a problem and it was down:

Mon Jan 28 07:04:59 GMT [localhost: rc:notice]: The system was down for 155109 seconds

## After that, the panic string appeared:

Mon Jan 28 07:50:08 CET [netapp1:sk.panic:ALERT]: Panic String: NVRAM contents are invalid... in SK process rc on release

8.1.2P4

## I don't know th reason but all aggregates showed this issue as well:

messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr3_fc has invalid NVRAM contents.
messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr1 has invalid NVRAM contents.
messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr0 has invalid NVRAM contents.

To sum up, the aggr0 has been unable to complete a reconstruction with several different disks, all of them failed in the end:

Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr0 has invalid NVRAM contents.
Mon Jan 28 08:04:36 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 0 msec, after scanning 0 inodes and restarting -1 times with a final result of starting.
Mon Jan 28 08:04:36 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 112 msec, after scanning 36 inodes and restarting 25 times with a final result of success.
Mon Jan 28 08:05:10 CET [netapp1:raid.rg.recons.resume:debug]: /aggr0/plex0/rg0: resuming reconstruction, using disk 1a.32 (block 529792, 0% complete)
Mon Jan 28 08:16:03 CET [netapp1:raid.rg.recons.done:debug]: /aggr0/plex0/rg0: reconstruction completed for 0c.32 in 10:52.56

Mon Jan 28 08:38:10 CET [netapp1:raid.rg.spares.low:warning]: /aggr0/plex0/rg0
Mon Jan 28 10:20:10 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP   X269_WMARS01TSSX NA00] S/N [WD-WMATV4808646] failed.
Mon Jan 28 10:20:24 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 10:20:24 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 11:00:00 CET [netapp1:monitor.raid.brokenDisk:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.

Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.32
Mon Jan 28 16:52:49 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP   X269_HJUPI01TSSX NA01] S/N [HZ3723PL] failed.
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 4:56.04
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 17:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.

Mon Jan 28 23:13:29 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 0 msec, after scanning 0 inodes and restarting -1 times with a final result of starting.
Mon Jan 28 23:13:29 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 79 msec, after scanning 36 inodes and restarting 23 times with a final result of success.
Mon Jan 28 23:13:45 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 23:13:45 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 23:14:02 CET [netapp1:monitor.raid.brokenDisk:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
Mon Jan 28 23:19:00 CET [netapp1:raid.rg.spares.low:warning]: /aggr0/plex0/rg0
Tue Jan 29 00:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.

Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.34 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.34
Wed Jan 30 14:03:59 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/1a.34 Shelf 2 Bay 2 [NETAPP   X269_HJUPI01TSSX NA01] S/N [N03479DL] failed.
Wed Jan 30 14:03:59 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:03:59 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 9:41.17
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.44 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 1a.44
Wed Jan 30 14:13:29 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/1a.44 Shelf 2 Bay 12 [NETAPP   X269_HJUPI01TSSX NA01] S/N [N12V1VZL] failed.
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 9:18.82
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.42 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:13:59 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.42
Wed Jan 30 14:35:35 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.42 Shelf 2 Bay 10 [NETAPP   X269_HJUPI01TSSX NA01] S/N [J80PGUBL] failed.
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 21:35.78
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.32
Wed Jan 30 14:40:58 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP   X269_HJUPI01TSSX NA01] S/N [J80PH3UL] failed.
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 5:23.22
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Wed Jan 30 15:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.

So we have the aggr0 as follows:

Aggregate aggr0 (online, raid_dp, degraded) (block checksums)
Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (degraded, block checksums)

      RAID Disk   Device   HA SHELF BAY CHAN Pool Type RPM Used (MB/blks)    Phys (MB/blks)
      ---------   ------   ------------- ---- ---- ---- ----- --------------    --------------
      dparity    1a.16    1a    1   0   FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      parity     0c.17    0c    1   1   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.18    1a    1   2   FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.19    1a    1   3   FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.20    0c    1   4   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.21    0c    1   5   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.22    1a    1   6   FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.23    1a    1   7   FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.24    0c    1   8   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.25    0c    1   9   FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.26    1a    1   10 FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.27    0c    1   11 FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       0c.28    0c    1   12 FC:B   -   ATA 7200 847555/1735794176 847827/1736350304
      data       1a.29    1a    1   13 FC:A   -   ATA 7200 847555/1735794176 847827/1736350304
      data   FAILED       N/A                        847555/ -

But every time we try to replace a disk , we see the following message "has bad label" so it's my undertanding that I have to apply the following commands:
disk unfail -s 1a.32
disk zero spares

And it seems the reconstrucction process starts but failed in the end as you can see in the messages above.

Please, do you have any idea, new approach or comment for this behaviour?

thanks in advance!
Regards

Cristian

Several disks failed FAS3140

New video on NetApp KB TV

New video on NetApp KB TV

New video on NetApp KB TV