ONTAP Hardware
ONTAP Hardware
Hi All,
we have a problem with the following system array:
NetApp Release 8.1.2P4 7-Mode: Fri Apr 26 19:57:25 PDT 2013
Model Name: FAS3140
Curretly, there are 4 disks failed:
7) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=N03479DL, uniqueid=08aa7269-6efd95ce-da7af66b-d37a2c34
timefailed=1548853439 (30Jan2019 14:03:59), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.34
😎 vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=N12V1VZL, uniqueid=439e9052-770044dd-3891c1dd-2fa23231
timefailed=1548854040 (30Jan2019 14:14:00), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.44
9) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=J80PGUBL, uniqueid=a2206950-31e01634-c597a1e9-e7309e7c
timefailed=1548855335 (30Jan2019 14:35:35), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.42
10) vendor=NETAPP , model=X269_HJUPI01TSSX, serialno=J80PH3UL, uniqueid=28bc6823-93a74d8f-5464c40a-b3f47c4d
timefailed=1548855658 (30Jan2019 14:40:58), timelastseen=1548858671 (30Jan2019 15:31:11), device=0c.32
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0c.32 0c 2 0 FC:B - ATA 7200 847555/1735794176 847827/1736350304
failed 0c.34 0c 2 2 FC:B - ATA 7200 847555/1735794176 847827/1736350304
failed 0c.42 0c 2 10 FC:B - ATA 7200 847555/1735794176 847827/1736350304
failed 1a.44 1a 2 12 FC:A - ATA 7200 847555/1735794176 847827/1736350304
Note.- All disks failed belong to Shelf1.
We tried to replace the disk 0c.32 applying the following commands:
disk unfail -s 1a.32
disk zero spares
But it seems failed:
Fri Feb 1 13:12:01 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:19:58 CET [netapp1:raid.assim.disk.nolabels:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has no valid labels. It will be taken out of service to prevent possible data loss.
Fri Feb 1 13:19:58 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:19:58 CET [netapp1:callhome.dsk.label:CRITICAL]: Call home for DISK BAD LABEL
Fri Feb 1 13:25:11 CET [netapp1:raid.assim.disk.nolabels:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has no valid labels. It will be taken out of service to prevent possible data loss.
Fri Feb 1 13:25:11 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:25:11 CET [netapp1:raid.config.disk.bad.label:error]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] has bad label.
Fri Feb 1 13:25:32 CET [netapp1:raid.disk.unfail.done:info]: Disk 1a.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] unfailed, and is now a spare
Fri Feb 1 13:25:47 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Fri Feb 1 13:25:47 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Fri Feb 1 13:25:48 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 1a.32
Fri Feb 1 13:28:58 CET [netapp1:scsi.path.excessiveErrors:error]: Excessive errors encountered by adapter 1a on disk device 1a.32.
Fri Feb 1 13:43:33 CET [netapp1:shm.threshold.highIOLatency:error]: Disk 1a.32 exceeds the average IO latency threshold and will be recommended for failure.
Fri Feb 1 13:43:48 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP X269_SMSKP01TSSX NA00] S/N [Z1N1QJWZ] failed.
Fri Feb 1 13:43:48 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
In addition, we got a lot of "loss of sync count" in the Shelf1 when we tried to replace the disk in the Shelf1:
Loop Link Transport Loss of Invalid Frame In Frame Out
ID Failure error sync CRC count count
count count count count
0c.14 0 0 6982 3 7970 15894
0c.15 0 0 6982 3 552 1211
0c.16 0 0 6982 3 1250276 2194998
0c.17 0 0 6982 3 955798 1884761
0c.18 0 0 6982 3 150197 1175046
0c.19 0 0 6982 3 271737 1389313
0c.20 0 0 6982 3 120787 1167596
0c.21 0 0 6982 3 370856 1633971
0c.22 0 0 6982 3 371410 1628533
0c.23 0 0 6982 3 370144 1625221
0c.24 0 0 6982 3 505295 1836872
0c.25 0 0 6982 3 120763 1167330
0c.26 0 0 6982 3 255631 1364559
0c.27 0 0 6982 3 255062 1370523
0c.28 0 0 6982 3 256094 1373862
0c.29 0 0 6982 3 136208 1182068
0c.30 0 0 1 1 19998 40024
0c.31 0 0 1 1 666 1521
0c.32 0 0 1 1 9908 20834
0c.33 0 0 1 1 7508 25477
0c.34 0 0 1 1 531154 56365
0c.35 0 0 1 1 4918 16840
0c.36 0 0 1 1 5294 18075
0c.37 0 0 1 1 12235 41756
0c.38 0 0 1 1 7445 25337
0c.39 0 0 1 1 12280 41902
0c.40 0 0 1 1 7191 24377
0c.41 0 0 1 1 4985 16987
0c.42 0 0 1 1 573026 30906
0c.43 0 0 1 1 7194 24386
0c.44 0 0 1 1 32422 31833
0c.45 0 0 1 1 2723 9297
0c.46 0 0 1 20 3850 7637
0c.47 0 0 1 20 436 886
0c.48 0 0 1 20 3037 13010
0c.49 0 0 1 20 108756 334244
0c.50 0 0 1 20 1385 5969
0c.51 0 0 1 20 2992 12900
0c.52 0 0 1 20 2992 12900
0c.53 0 0 1 20 3230 13852
0c.54 0 0 1 20 6138 26390
0c.55 0 0 1 20 5900 25438
0c.56 0 0 1 20 1424 6061
0c.57 0 0 1 20 4332 18599
0c.58 0 0 1 20 4531 19459
0c.59 0 0 1 20 3191 13760
0c.60 0 0 1 20 4492 19367
0c.61 0 0 1 20 3230 13852
0c.62 0 0 7 11 12469 25009
0c.63 0 0 7 11 483 1110
0c.64 0 0 7 11 1385 5969
0c.65 0 0 7 11 4492 19367
0c.66 0 0 7 11 80574 247613
0c.67 0 0 7 11 4293 18507
0c.68 0 0 7 11 3031 12992
0c.69 0 0 7 11 3031 12992
0c.70 0 0 7 11 2992 12900
0c.71 0 0 7 11 3031 12992
0c.72 0 0 7 11 5812 25059
0c.73 0 0 7 11 2992 12900
0c.74 0 0 7 11 3191 13760
0c.75 0 0 7 11 4293 18507
0c.76 0 0 7 11 3031 12992
0c.77 0 0 7 11 3167 13656
0c.78 0 0 1 7 19777 39693
0c.79 0 0 1 7 471 1078
0c.80 0 0 1 7 6099 26298
0c.81 0 0 1 7 1424 6061
0c.82 0 0 1 7 3191 13760
0c.83 0 0 1 7 7458 32082
0c.84 0 0 1 7 4293 18507
0c.85 0 0 1 7 2992 12900
0c.86 0 0 1 7 3031 12992
0c.87 0 0 1 7 4492 19367
0c.88 0 0 1 7 2992 12900
0c.89 0 0 1 7 4492 19367
0c.90 0 0 1 7 3031 12992
0c.91 0 0 1 7 1599 6817
0c.92 0 0 1 7 1385 5969
0c.93 0 0 1 7 6099 26298
0c.ha 0 0 0 0 22715776 7008913
Loop Link Transport Loss of Invalid Frame In Frame Out
ID Failure error sync CRC count count
count count count count
1a.14 0 0 5815 4 12504 12504
1a.15 0 0 5815 4 679 679
1a.16 0 0 5815 4 285279 1136892
1a.17 0 0 5815 4 686577 1447479
1a.18 0 0 5815 4 608908 1723524
1a.19 0 0 5815 4 349771 1488279
1a.20 0 0 5815 4 496669 1693145
1a.21 0 0 5815 4 252215 1248341
1a.22 0 0 5815 4 252981 1257138
1a.23 0 0 5815 4 252404 1255141
1a.24 0 0 5815 4 119328 1055650
1a.25 0 0 5815 4 496661 1694509
1a.26 0 0 5815 4 365019 1508160
1a.27 0 0 5815 4 363786 1499459
1a.28 0 0 5815 4 364769 1502375
1a.29 0 0 5815 4 481790 1680784
1a.30 0 0 1 1 584 584
1a.31 0 0 1 1 574 574
1a.32 0 0 1 1 70366 4254
1a.33 0 0 1 1 4991 13199
1a.34 0 0 1 1 16245 2807
1a.35 0 0 1 1 7582 19784
1a.36 0 0 1 1 7199 18807
1a.37 0 0 1 1 260 692
1a.38 0 0 1 1 5050 13322
1a.39 0 0 1 1 215 583
1a.40 0 0 1 1 5308 14050
1a.41 0 0 1 1 7515 19653
1a.42 0 0 1 1 720388 14154
1a.43 0 0 1 1 5305 14043
1a.44 0 0 1 1 165735 18289
1a.45 0 0 1 1 9788 25656
1a.46 0 0 1 3 16592 16592
1a.47 0 0 1 3 795 795
1a.48 0 0 1 3 4549 16561
1a.49 0 0 1 3 24672 74932
1a.50 0 0 1 3 6195 22491
1a.51 0 0 1 3 4594 16654
1a.52 0 0 1 3 4588 16636
1a.53 0 0 1 3 4350 15834
1a.54 0 0 1 3 1442 5242
1a.55 0 0 1 3 1680 6044
1a.56 0 0 1 3 6156 22416
1a.57 0 0 1 3 3248 11824
1a.58 0 0 1 3 3049 11097
1a.59 0 0 1 3 4389 15909
1a.60 0 0 1 3 3088 11172
1a.61 0 0 1 3 4350 15834
1a.62 0 0 4 4 8003 8003
1a.63 0 0 4 4 719 719
1a.64 0 0 4 4 6195 22491
1a.65 0 0 4 4 3088 11172
1a.66 0 0 4 4 52854 160470
1a.67 0 0 4 4 3287 11899
1a.68 0 0 4 4 4549 16561
1a.69 0 0 4 4 4549 16561
1a.70 0 0 4 4 4588 16636
1a.71 0 0 4 4 4549 16561
1a.72 0 0 4 4 1774 6382
1a.73 0 0 4 4 4588 16636
1a.74 0 0 4 4 4389 15909
1a.75 0 0 4 4 3287 11899
1a.76 0 0 4 4 4549 16561
1a.77 0 0 4 4 4413 15997
1a.78 0 0 1 5 718 718
1a.79 0 0 1 5 730 730
1a.80 0 0 1 5 1481 5317
1a.81 0 0 1 5 6156 22416
1a.82 0 0 1 5 4389 15909
1a.83 0 0 1 5 122 434
1a.84 0 0 1 5 3287 11899
1a.85 0 0 1 5 4588 16636
1a.86 0 0 1 5 4549 16561
1a.87 0 0 1 5 3088 11172
1a.88 0 0 1 5 4588 16636
1a.89 0 0 1 5 3088 11172
1a.90 0 0 1 5 4549 16561
1a.91 0 0 1 5 5981 21777
1a.92 0 0 1 5 6195 22491
1a.93 0 0 1 5 1481 5317
1a.ha 1 0 20 0 21224756 6680563
So at this point we are unable to replace any failed disk, so Should be the root cause of this behaviour any component in the Loop (ESH module, FC cable ...)?
Please, let me know if you need any command output or/and more explanations about it.
Thanks in advance,
Regards.
Cristian
Hi there!
I'm going to guess you've obtained these disks from eBay or similar secondhand market.
Unfortunately it looks like it may also be failed or faulty.
Sorry, try again would be my suggestion.
Hard drives are mechanical devices with a finite lifespan based on potential wear to internal structures - these drives are now 7-12 years old.
Hi Alex,
first of all, thank you very much for your reply.
But it seems we have another different problem about it, let me explain better than the last time:
-- The system had a problem and it was down:
Mon Jan 28 07:04:59 GMT [localhost: rc:notice]: The system was down for 155109 seconds
## After that, the panic string appeared:
Mon Jan 28 07:50:08 CET [netapp1:sk.panic:ALERT]: Panic String: NVRAM contents are invalid... in SK process rc on release
8.1.2P4
## I don't know th reason but all aggregates showed this issue as well:
messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr3_fc has invalid NVRAM contents.
messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr1 has invalid NVRAM contents.
messages.0:Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr0 has invalid NVRAM contents.
To sum up, the aggr0 has been unable to complete a reconstruction with several different disks, all of them failed in the end:
Mon Jan 28 08:04:35 CET [netapp1:raid.vol.reparity.issue:warning]: Aggregate aggr0 has invalid NVRAM contents.
Mon Jan 28 08:04:36 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 0 msec, after scanning 0 inodes and restarting -1 times with a final result of starting.
Mon Jan 28 08:04:36 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 112 msec, after scanning 36 inodes and restarting 25 times with a final result of success.
Mon Jan 28 08:05:10 CET [netapp1:raid.rg.recons.resume:debug]: /aggr0/plex0/rg0: resuming reconstruction, using disk 1a.32 (block 529792, 0% complete)
Mon Jan 28 08:16:03 CET [netapp1:raid.rg.recons.done:debug]: /aggr0/plex0/rg0: reconstruction completed for 0c.32 in 10:52.56
Mon Jan 28 08:38:10 CET [netapp1:raid.rg.spares.low:warning]: /aggr0/plex0/rg0
Mon Jan 28 10:20:10 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP X269_WMARS01TSSX NA00] S/N [WD-WMATV4808646] failed.
Mon Jan 28 10:20:24 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 10:20:24 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 11:00:00 CET [netapp1:monitor.raid.brokenDisk:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Mon Jan 28 16:47:53 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.32
Mon Jan 28 16:52:49 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [HZ3723PL] failed.
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 4:56.04
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 16:52:49 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 17:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
Mon Jan 28 23:13:29 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 0 msec, after scanning 0 inodes and restarting -1 times with a final result of starting.
Mon Jan 28 23:13:29 CET [netapp1:wafl.aggr.btiddb.build:info]: Buftreeid database for aggregate 'aggr0' UUID '4d5c0920-2ac7-11df-8f8f-00a0982321ca' was built in 79 msec, after scanning 36 inodes and restarting 23 times with a final result of success.
Mon Jan 28 23:13:45 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Mon Jan 28 23:13:45 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Mon Jan 28 23:14:02 CET [netapp1:monitor.raid.brokenDisk:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
Mon Jan 28 23:19:00 CET [netapp1:raid.rg.spares.low:warning]: /aggr0/plex0/rg0
Tue Jan 29 00:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.34 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 13:54:18 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.34
Wed Jan 30 14:03:59 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/1a.34 Shelf 2 Bay 2 [NETAPP X269_HJUPI01TSSX NA01] S/N [N03479DL] failed.
Wed Jan 30 14:03:59 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:03:59 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 9:41.17
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.44 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:04:10 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 1a.44
Wed Jan 30 14:13:29 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/1a.44 Shelf 2 Bay 12 [NETAPP X269_HJUPI01TSSX NA01] S/N [N12V1VZL] failed.
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 9:18.82
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:13:29 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 1a.42 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:13:59 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.42
Wed Jan 30 14:35:35 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.42 Shelf 2 Bay 10 [NETAPP X269_HJUPI01TSSX NA01] S/N [J80PGUBL] failed.
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 21:35.78
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.info:notice]: Spare disk 0c.32 will be used to reconstruct one missing disk in RAID group /aggr0/plex0/rg0.
Wed Jan 30 14:35:35 CET [netapp1:raid.rg.recons.start:notice]: /aggr0/plex0/rg0: starting reconstruction, using disk 0c.32
Wed Jan 30 14:40:58 CET [netapp1:raid.config.filesystem.disk.failed:error]: File system Disk /aggr0/plex0/rg0/0c.32 Shelf 2 Bay 0 [NETAPP X269_HJUPI01TSSX NA01] S/N [J80PH3UL] failed.
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.disk.reconstruction.failed:notice]: /aggr0/plex0/rg0: reconstruction failed for a disk in the raidgroup
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.aborted:notice]: /aggr0/plex0/rg0: reconstruction aborted at disk block 5248 after 5:23.22
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.missing:notice]: RAID group /aggr0/plex0/rg0 is missing 1 disk(s).
Wed Jan 30 14:40:58 CET [netapp1:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr0/plex0/rg0: No matching disks available in spare pool
Wed Jan 30 15:00:00 CET [netapp1:monitor.raiddp.vol.singleDegraded:warning]: data disk in RAID group /aggr0/plex0/rg0 is broken.
So we have the aggr0 as follows:
Aggregate aggr0 (online, raid_dp, degraded) (block checksums)
Plex /aggr0/plex0 (online, normal, active)
RAID group /aggr0/plex0/rg0 (degraded, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 1a.16 1a 1 0 FC:A - ATA 7200 847555/1735794176 847827/1736350304
parity 0c.17 0c 1 1 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 1a.18 1a 1 2 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data 1a.19 1a 1 3 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data 0c.20 0c 1 4 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 0c.21 0c 1 5 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 1a.22 1a 1 6 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data 1a.23 1a 1 7 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data 0c.24 0c 1 8 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 0c.25 0c 1 9 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 1a.26 1a 1 10 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data 0c.27 0c 1 11 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 0c.28 0c 1 12 FC:B - ATA 7200 847555/1735794176 847827/1736350304
data 1a.29 1a 1 13 FC:A - ATA 7200 847555/1735794176 847827/1736350304
data FAILED N/A 847555/ -
But every time we try to replace a disk , we see the following message "has bad label" so it's my undertanding that I have to apply the following commands:
disk unfail -s 1a.32
disk zero spares
And it seems the reconstrucction process starts but failed in the end as you can see in the messages above.
Please, do you have any idea, new approach or comment for this behaviour?
thanks in advance!
Regards
Cristian