Subscribe

Re-used DS4243 disk failures

Hi Admins,

 

I could have taken this up with Netapp support directly but thought to post this question on the community first. Also we bought the shelf from a re-seller for testing. 

 

We plugged the shelf and booted it up but we camt to know there were existing aggregates on the shelf named aggr0 and agg0(1) namely. So we deleted them. Then we assigned all the drives to the controller 1 so that we can create a new aggregate and assign the drives respectively. Once the drives were assigned, we noticed few drives started failing and also the zeroing of few spares did not start as it should be. 

 

When we manually started zeroing the spares, they reach upto 30% zeroing and then the drive moves into the failed state. We waited until the zeroing completed to determine how many drives move into the broken disks list. Then we again "unfailed" the drives and started zeroing the spares again for the left out drives.

 

To be noted, if any drive is really faulty, it should not be "unfailing" when we try to unfail it. (Am not 100% sure if this is correct). So I got the below output of how things look at my end.

 

==========================================
Output from "environment status shelf":
==========================================

Channel: 0a
Shelf: 22
SES device path: local access: 0b.22.99
Module type: IOM3; monitoring is active
Shelf status: normal condition
SES Configuration, shelf 22:
logical identifier=0x50050cc10200646f
vendor identification=NETAPP
product identification=DS4243
product revision level=0172
Vendor-specific information:
Product Serial Number: xxxxxxxxxxxxxxxx
Status reads attempted: 118258; failed: 0
Control writes attempted: 553; failed: 0
Shelf bays with disk devices installed:
23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
with error: none

Shelf mapping (shelf-assigned addresses) for channel 0a:
Shelf 21: 23 22 21 20 19 18 XXX 16 15 14 13 12 XXX XXX 9 8 7 6 5 4 3 2 1 0
Shelf 22: 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0


==========================================
vol status -f
==========================================

Broken disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0a.21.15 0a 21 15 SA:B 0 BSAS 7200 847555/1735794176 847884/1736466816
failed 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 847555/1735794176 847884/1736466816
failed 0b.22.19 0b 22 19 SA:A 0 BSAS 7200 847555/1735794176 847884/1736466816

==========================================
Output from sysconfig -a:
==========================================
sysconfig -a | more

NetApp Release 8.1.4 7-Mode

System Rev: F6
System Storage Configuration: Multi-Path HA
System ACP Connectivity: Partial Connectivity
slot 0: System Board 2.3 GHz (System Board XVI F6)
Model Name: FAS3240
Part Number: 111-00693
Revision: F6
Serial Number: xxxxxxxxxxxxx
BIOS version: 5.2.1
Loader version: 3.4
Processors: 4
Processor ID: 0x1067a
Microcode Version: 0xa0b
Processor type: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
Memory Size: 8192 MB

slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)
Firmware rev: 01.11.07.00
Base WWN: 5:00a098:0012:48
Phy State: [0] Enabled, 3.0 Gb/s
[1] Enabled, 3.0 Gb/s
[2] Enabled, 3.0 Gb/s
[3] Enabled, 3.0 Gb/s
QSFP Vendor: Molex Inc.
QSFP Part Number: 112-00178+A0
QSFP Type: Passive Copper 5m ID:00
QSFP Serial Number: xxxxxxxxxxxx

..............................
..............................
..............................
..............................
..............................
..............................
..............................

Output truncated regarding the drives

Shelf 21: IOM3 Firmware rev. IOM3 A: 0172 IOM3 B: 0172
Shelf 22: IOM3 Firmware rev. IOM3 A: 0172 IOM3 B: 0172

slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <UP>)
Firmware rev: 01.11.07.00
Base WWN: 5:00a098:012:4c
Phy State: [4] Enabled, 3.0 Gb/s
[5] Enabled, 3.0 Gb/s
[6] Enabled, 3.0 Gb/s
[7] Enabled, 3.0 Gb/s
QSFP Vendor: Molex Inc.
QSFP Part Number: 112-00178+A0
QSFP Type: Passive Copper 5m ID:01
QSFP Serial Number: xxxxxxxxxxxxx
..............................
..............................
..............................
..............................
..............................
..............................
..............................

Output truncated regarding the drives


Shelf 21: IOM3 Firmware rev. IOM3 A: 0172 IOM3 B: 0172
Shelf 22: IOM3 Firmware rev. IOM3 A: 0172 IOM3 B: 0172

 

=========================================================================

 

Would be thankful if you can please share some thoughts / suggestions regarding this issue. I suspect the drives are really not faulty but there is something happening in the background which is causing this issue. 

 

 

Regards

Imtiaz