Subscribe

What is happening to our NetApp ? How can we recover it ?

Hello,

Recently we realized a detailed study about our NETAPP, and we saw about 6 disk in bad state.

We changed that 6 disk, and when the new 6 disk were introduced, at least 9 disk more were to FAIL state, Now we have 12 disk at FAIL State.

Before the change of that 6 disks, we had change the wires which connected a shelf to ours stors.

I will try to explain the change:

Stor A , port 3b was connected to a shelf at its low part

Stor B , port 3b was connected to the same shelf at its upper part

I thought that was better to have Stor A connected to upper part and Stor B to down part

After this change we had to halt both Stors, (because of another not related problems).

Two weeks later I booted both stors. Stor B was working fine, but Stor A was rebooting all the time. Then the new disk were changed and then 9 disks were to FAIL (just after the change).

Why do you think it could be possible that 9 disk together were to FAIL ?

What do you recommend us ? I don't want to change 12 disks without any possible explication.

After changing the 6 disk, Stor A is rebooting all the time.

I tried to entry to Maintenance shell, and this is what I can see:

*> disk show -v

Local System ID: 15xxx

  DISK       OWNER                    POOL   SERIAL NUMBER         DR HOME            CHKSUM

------------ -------------            -----  -------------    -------------            ------- 

0c.27                                   FAILED                      Block

0a.45                                   FAILED                      Block

0c.24                                   FAILED                      Block

0a.35                                   FAILED                      Block

0a.26        Stor01    (151697145)    Pool0  3LM5PDB200009909GPWQ  Stor01    (151697145)    Block

0a.42        Stor01    (151697145)    Pool0  3LM5P33K000099090SLV  Stor01    (151697145)    Block

0c.38        Stor01    (151697145)    Pool0  3LM5P33Q000099090NTN  Stor01    (151697145)    Block

0c.36        Stor01    (151697145)    Pool0  3LM5P4JD00009909GPXX  Stor01    (151697145)    Block

0a.40        Stor01    (151697145)    Pool0  3LM5P4F800009909GTJ4  Stor01    (151697145)    Block

0a.32        Stor01    (151697145)    Pool0  3LM5P4WG00009909NXNT  Stor01    (151697145)    Block

0c.28        Stor01    (151697145)    Pool0  3LM5QGVY00009909P06D  Stor01    (151697145)    Block

0c.34        Stor01    (151697145)    Pool0  3LM5P4Q800009909PXP5  Stor01    (151697145)    Block

0a.28        Stor01    (151697145)    Pool0  3LM5PC7000009909FZ3D  Stor01    (151697145)    Block

0a.22        Stor01    (151697145)    Pool0  3LM5PC9Q00009907552X  Stor01    (151697145)    Block

0a.20        Stor01    (151697145)    Pool0  JHW2869C              Stor01    (151697145)    Block

0c.20        Stor01    (151697145)    Pool0  JLVGSU1C              Stor01    (151697145)    Block

0a.25        Stor02    (151697133)    Pool0  3LM5PC1000009909GPC7  Stor02    (151697133)    Block

0a.29        Stor02    (151697133)    Pool0  3LM5PDZZ00009909FZPW  Stor02    (151697133)    Block

0c.41        Stor02    (151697133)    Pool0  3LM5P4RJ00009909G0BG  Stor02    (151697133)    Block

0c.35        Stor02    (151697133)    Pool0  3LM5P39V000099090UJS  Stor02    (151697133)    Block

0c.44        Stor02    (151697133)    Pool0  3LM5QG7000009909PXP3  Stor02    (151697133)    Block

0a.44        Stor02    (151697133)    Pool0  3LM5P30A00009909GSE2  Stor02    (151697133)    Block

0c.25        Stor02    (151697133)    Pool0  3LM5NYC500009909FZ7G  Stor02    (151697133)    Block

0c.37        Stor02    (151697133)    Pool0  3LM5P2WN00009909GSEG  Stor02    (151697133)    Block

0c.45        Stor02    (151697133)    Pool0  3LM5QDGG00009909PX9E  Stor02    (151697133)    Block

0c.32        Stor01    (151697145)    Pool0  3LM5P3EA000099077F9C  Stor01    (151697145)    Block

0c.21        Stor02    (151697133)    Pool0  3LM5PDWE00009909PTBF  Stor02    (151697133)    Block

0c.43        Stor02    (151697133)    Pool0  3LM1H0B900009745VZAJ  Stor02    (151697133)    Block

0a.16        Stor01    (151697145)    Pool0  3LM5PC0J00009909FYSV  Stor01    (151697145)    Block

0c.33        Stor02    (151697133)    Pool0  3LM5P3JM00009908AL3X  Stor02    (151697133)    Block

0a.21        Stor02    (151697133)    Pool0  3LM5PC6X00009909FWQ4  Stor02    (151697133)    Block

0c.29        Stor02    (151697133)    Pool0  3LM5NYC700009909N0U7  Stor02    (151697133)    Block

0a.23        Stor02    (151697133)    Pool0  3LM5PBYV00009909GPFB  Stor02    (151697133)    Block

0a.33        Stor02    (151697133)    Pool0  3LM5PB2Y00009909FZEN  Stor02    (151697133)    Block

0a.19        Stor02    (151697133)    Pool0  3LM5PDHL00009909GPVT  Stor02    (151697133)    Block

0c.19        Stor02    (151697133)    Pool0  3LM5P2X600009909PXDH  Stor02    (151697133)    Block

0c.23        Stor02    (151697133)    Pool0  3LM5QDJH00009909MZ9W  Stor02    (151697133)    Block

0a.17        Stor02    (151697133)    Pool0  3LM5P3JC00009908ANSP  Stor02    (151697133)    Block

0a.37        Stor02    (151697133)    Pool0  3LM5P30Z000099090P4M  Stor02    (151697133)    Block

0c.39        Stor02    (151697133)    Pool0  3LM5QG4900009909FV89  Stor02    (151697133)    Block

0a.39        Stor02    (151697133)    Pool0  3LM5P4LX00009909PTJA  Stor02    (151697133)    Block

3b.18        Stor01    (151697145)    Pool0  J813LWGL              Stor01    (151697145)    Block

3b.16        Stor01    (151697145)    Pool0  J817RR5L              Stor01    (151697145)    Block

3b.20        Stor01    (151697145)    Pool0  J81623XL              Stor01    (151697145)    Block

3b.26        Stor01    (151697145)    Pool0  J80Z1LML              Stor01    (151697145)    Block

4b.38        Stor01    (151697145)    Pool0  PBJ10JZE              Stor01    (151697145)    Block

0c.17                  (101183107)    Pool0  JLVE95JC                        (101183107)    Block

3b.22        Stor01    (151697145)    Pool0  J815GWYL              Stor01    (151697145)    Block

4b.40        Stor01    (151697145)    Pool0  PBJ1T6YE              Stor01    (151697145)    Block

4b.34        Stor01    (151697145)    Pool0  PBJ1J6BE              Stor01    (151697145)    Block

3b.24        Stor01    (151697145)    Pool0  J814TLBL              Stor01    (151697145)    Block

4b.36        Stor01    (151697145)    Pool0  PBJ1KUYE              Stor01    (151697145)    Block

4b.32        Stor01    (151697145)    Pool0  PBJ0Y8KE              Stor01    (151697145)    Block

3b.28        Stor01    (151697145)    Pool0  J815GWRL              Stor01    (151697145)    Block

4b.42        Stor01    (151697145)    Pool0  PBJ1U8YE              Stor01    (151697145)    Block

4b.44        Stor01    (151697145)    Pool0  PBJ109VE              Stor01    (151697145)    Block

4b.18        Stor01    (151697145)    Pool0  PAKUD3HE              Stor01    (151697145)    Block

4b.16        Stor01    (151697145)    Pool0  PAKSAXZE              Stor01    (151697145)    Block

4b.22        Stor01    (151697145)    Pool0  PAKUBLDE              Stor01    (151697145)    Block

4b.24        Stor01    (151697145)    Pool0  PAKUDY5E              Stor01    (151697145)    Block

4b.20        Stor01    (151697145)    Pool0  PAKU2BLE              Stor01    (151697145)    Block

4b.26        Stor01    (151697145)    Pool0  PAKUG05E              Stor01    (151697145)    Block

4b.23        Stor02    (151697133)    Pool0  PAKS9RHE              Stor02    (151697133)    Block

4b.21        Stor02    (151697133)    Pool0  PAKUDTNE              Stor02    (151697133)    Block

4b.43        Stor02    (151697133)    Pool0  PBJ1NN2E              Stor02    (151697133)    Block

4b.37        Stor02    (151697133)    Pool0  PBJ0Y8NE              Stor02    (151697133)    Block

4b.25        Stor02    (151697133)    Pool0  PAKUD17E              Stor02    (151697133)    Block

4b.19        Stor02    (151697133)    Pool0  PAKUBZ6E              Stor02    (151697133)    Block

3b.27        Stor02    (151697133)    Pool0  J815GW0L              Stor02    (151697133)    Block

4b.45        Stor02    (151697133)    Pool0  PBJ10LXE              Stor02    (151697133)    Block

4b.41        Stor02    (151697133)    Pool0  PBJ1U8XE              Stor02    (151697133)    Block

4b.17        Stor02    (151697133)    Pool0  PAKSKJTE              Stor02    (151697133)    Block

4b.35        Stor02    (151697133)    Pool0  PBJ0Y8EE              Stor02    (151697133)    Block

4b.39        Stor02    (151697133)    Pool0  PBJ1S2ZE              Stor02    (151697133)    Block

3b.17        Stor02    (151697133)    Pool0  J814TKXL              Stor02    (151697133)    Block

4b.33        Stor02    (151697133)    Pool0  PBJ10J0E              Stor02    (151697133)    Block

3b.25        Stor02    (151697133)    Pool0  J811YHTL              Stor02    (151697133)    Block

3b.19        Stor02    (151697133)    Pool0  J814V9GL              Stor02    (151697133)    Block

3b.23        Stor02    (151697133)    Pool0  J815GY8L              Stor02    (151697133)    Block

3b.21        Stor02    (151697133)    Pool0  J813J9GL              Stor02    (151697133)    Block

3b.29        Stor02    (151697133)    Pool0  J813LELL              Stor02    (151697133)    Block

0a.27        Stor02    (151697133)    Pool0  JLVGH9UC              Stor02    (151697133)    Block

0c.22        Stor02    (151697133)    Pool0  JLVGGUEC              Stor02    (151697133)    Block

0c.18                  (101183107)    Pool0  JLVEKWSC                        (101183107)    Block

0a.43        Stor02    (151697133)    Pool0  JLVGH1VC              Stor02    (151697133)    Block

0c.16                  (101183107)    Pool0  JLVGSPSC                        (101183107)    Block

0c.26                  (101183107)    Pool0  JLVGPASC                        (101183107)    Block

0a.41                                   FAILED 3LM5P398000099090U5Y Block

0a.34                                   FAILED 3LM5P3JN00009909GQ6K Block

0c.42                                   FAILED 3LM5P4NJ00009909GQYE Block

0a.24                                   FAILED 3LM5P4G6000099090PSC Block

0a.36                                   FAILED 3LM5P4RD00009909PTMM Block

0a.18                                   FAILED 3LM5PBZK00009909GPCY Block

0a.38                                   FAILED 3LM5P3FZ00009909FZNT Block

0c.40                                   FAILED 3LM5QGHA00009909NZ4F Block

*> aggr status

Jul 14 11:46:09 [localhost:fmmb.current.lock.disk:info]: Disk 0a.16 is a local HA mailbox disk.

Jul 14 11:46:09 [localhost:fmmb.instStat.change:info]: normal mailbox instance on local side.

           Aggr State           Status                Options

Jul 14 11:46:09 [localhost:coredump.host.spare.none:info]: No sparecore disk was found for host 0.

          aggr0 failed          raid_dp, aggr         diskroot, lost_write_protect=off

Jul 14 11:46:09 [localhost:raid.assim.rg.missingChild:error]: Aggregate aggr1, rgobj_verify: RAID object 0 has only 7 valid children, expected 12.

                                partial              

Jul 14 11:46:09 [localhost:raid.assim.plex.missingChild:error]: Aggregate aggr1, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline

          aggr1 failed          raid_dp, aggr         lost_write_protect=off

Jul 14 11:46:09 [localhost:raid.assim.mirror.noChild:ALERT]: Aggregate aggr1, mirrorobj_verify: No operable plexes found.

                                partial              

Jul 14 11:46:09 [localhost:raid.assim.rg.missingChild:error]: Aggregate aggr0, rgobj_verify: RAID object 0 has only 7 valid children, expected 12.

          aggr2 online          raid_dp, aggr         nosnap=on

Jul 14 11:46:09 [localhost:raid.assim.plex.missingChild:error]: Aggregate aggr0, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline

                                32-bit               

Jul 14 11:46:09 [localhost:raid.assim.mirror.noChild:ALERT]: Aggregate aggr0, mirrorobj_verify: No operable plexes found.

          aggr3 online          raid_dp, aggr         nosnap=on

                                32-bit               

No root aggregate or root traditional volume found.

You must specify a root aggregate or traditional volume with

From the latest logs when it is rebooting infinitely:

Jul 14 10:18:59 [localhost:fci.adapter.link.online:info]: Fibre Channel adapter 0c link online.

Jul 14 10:19:01 [localhost:fci.device.quiesce:debug]: Adapter 0a encountered a command timeout on Disk device 0a.24 (0x05000018) LUN 0 cdb 0x28:0000a3f0:0008

retry: 2 Quiescing the device.

Jul 14 10:19:01 [localhost:scsi.cmd.checkCondition:debug]: Disk device 0a.24: Check Condition: CDB 0x28:0000a3f0:0008: Sense Data SCSI:aborted command -  (0xb

- 0x8 0x1 0x81)(40439).

Jul 14 10:19:01 [localhost:scsi.cmd.noMorePaths:debug]: Disk device 0a.24: No more paths to device: cdb 0x28:0000a3f0:0008. All retries have failed.

Jul 14 10:19:01 [localhost:disk.senseError:error]: Disk 0a.24: op 0x28:0000a3f0:0008 sector 41968 SCSI:aborted command -  (b 8 1 81)

Jul 14 10:19:01 [localhost:cf.nm.nicReset:warning]: HA interconnect: Initiating soft reset on card 0 due to rendezvous reset.

Jul 14 10:19:01 [localhost:rv.connection.torndown:info]: HA interconnect: cfo_rv is torn down on NIC 0.

Jul 14 10:19:01 [localhost:cf.rv.notConnected:error]: HA interconnect: Connection for 'cfo_rv' failed.

Jul 14 10:19:01 [localhost:cf.nm.nicTransitionDown:warning]: HA interconnect: Link down on NIC 0.

Jul 14 10:19:01 [localhost:cf.rv.notConnected:error]: HA interconnect: Connection for 'cfo_rv' failed.

Jul 14 10:19:03 [localhost:cf.nm.nicTransitionUp:info]: HA interconnect: Link up on NIC 0.

Jul 14 10:19:04 [localhost:rv.connection.established:info]: HA interconnect: cfo_rv is connected on NIC 0.

Jul 14 10:19:04 [localhost:fmmb.current.lock.disk:info]: Disk 0a.16 is a local HA mailbox disk.

Jul 14 10:19:04 [localhost:fmmb.instStat.change:info]: normal mailbox instance on local side.

Jul 14 10:19:08 [localhost:fmmb.current.lock.disk:info]: Disk 0c.19 is a partner HA mailbox disk.

Jul 14 10:19:08 [localhost:fmmb.current.lock.disk:info]: Disk 0a.23 is a partner HA mailbox disk.

Jul 14 10:19:08 [localhost:fmmb.instStat.change:info]: normal mailbox instance on partner side.

Jul 14 10:19:08 [localhost:cf.fm.partner:info]: Failover monitor: partner 'Stor02'

Jul 14 10:19:08 [localhost:coredump.host.spare.none:info]: No sparecore disk was found for host 0.

Jul 14 10:19:08 [localhost:raid.assim.rg.missingChild:error]: Aggregate aggr0, rgobj_verify: RAID object 0 has only 7 valid children, expected 12.

Jul 14 10:19:08 [localhost:raid.assim.plex.missingChild:error]: Aggregate aggr0, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being

taken offline

Jul 14 10:19:08 [localhost:raid.assim.mirror.noChild:ALERT]: Aggregate aggr0, mirrorobj_verify: No operable plexes found.

Uptime: 8m7s

Jul 14 10:19:08 [localhost:raid.assim.rg.missingChild:error]: Aggregate aggr1, rgobj_verify: RAID object 0 has only 7 valid children, expected 12.

Jul 14 10:19:08 [localhost:raid.assim.plex.missingChild:error]: Aggregate aggr1, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline

Jul 14 10:19:08 [localhost:raid.assim.mirror.noChild:ALERT]: Aggregate aggr1, mirrorobj_verify: No operable plexes found.

System rebooting...

Phoenix TrustedCore(tm) Server    

Copyright 1985-2006 Phoenix Technologies Ltd.

Thank you very much