ONTAP Hardware

FAS3020 will not boot - any ideas

akhtara
10,318 Views

All,

I have a FAS3020 that isn't booting. 

Here is what I am getting...any ideas to get this filer to boot?

NetApp Release 7.3.4: Thu May 27 17:52:48 PDT 2010

Copyright (c) 1992-2010 NetApp.

Starting boot on Fri Jul 13 18:34:03 GMT 2012

Fri Jul 13 18:34:46 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0b.

Fri Jul 13 18:34:54 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system

add net 127.0.0.Fr0: gateway 127.0i Jul 13 18:34:57 GMT [config.noPartnerDisks:CRITICAL]: No disks were detected for the partner; this node will be unable to takeover correctly

.0.1

Fri Jul 13 18:34:57 GMT [fmmb.current.lock.disk:info]: Disk 0a.18 is a local HA mailbox disk.

Fri Jul 13 18:34:57 GMT [fmmb.current.lock.disk:info]: Disk 0a.32 is a local HA mailbox disk.

Fri Jul 13 18:34:57 GMT [fmmb.instStat.change:info]: normal mailbox instance on local side.

Fri Jul 13 18:34:58 GMT [fmmb.instStat.change:info]: no mailbox instance on partner side.

Fri Jul 13 18:34:58 GMT [monitor.chassisPower.degraded:notice]: Chassis power is degraded:

Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.60 Shelf ? Bay ? [NETAPP   X272_SCHT6073F10 NA16] S/N [3HZ7VJLV00007502BD5H] (21273/1351414276) in plex aggr0/0, because it is out-of-date (21607).

Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.42 Shelf ? Bay ? [NETAPP   X272_HJURE073F10 NA14] S/N [41360296] (20851/1350424692) in plex aggr0/0, because it is out-of-date (21607).

Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.41 Shelf ? Bay ? [NETAPP   X272_HJURE073F10 NA14] S/N [41360830] (20965/1350435816) in plex aggr0/0, because it is out-of-date (21607).

Root volume is failed (all disks may not yet be visible).

Rebooting... (ctrl-c to break reboot loop)

17 REPLIES 17

scottgelb
10,229 Views

It shows you are missing drives in the root aggregate (and likely others).  I would boot to maintenance mode and check disk assignment, all disk paths are available... not sure if anything changed or if some hardware was swapped but something changed or disks or a shelf are not viewable or assigned.

akhtara
10,230 Views

Nothing has changed, was up yesterday and this morning "system inactive"..I rebooted it and its stuck in this loop.

storage show disk displays:

DISK              SHELF BAY SERIAL       VENDOR   MODEL  REV

--------------------- --------- ---------------- -------- ---------- ----

0a.16               10  41317181     NETAPP   X272_HJURE NA14
0a.17               11  41316538     NETAPP   X272_HJURE NA14
0a.18               12  41327496     NETAPP   X272_HJURE NA14
0a.19               13  412W4548     NETAPP   X272_HJURE NA14
0a.20               14  3HZ7SWGJ000073369BCP NETAPP   X272_SCHT6 NA16
0a.21               15  41360845     NETAPP   X272_HJURE NA14
0a.22               16  41320011     NETAPP   X272_HJURE NA14
0a.23               17  41316772     NETAPP   X272_HJURE NA14
0a.24               18  41316694     NETAPP   X272_HJURE NA14
0a.25               19  412W4347     NETAPP   X272_HJURE NA14
0a.26               1   10  412Y8249     NETAPP   X272_HJURE NA14
0a.27               1   11  41324284     NETAPP   X272_HJURE NA14
0a.28               1   12  41304561     NETAPP   X272_HJURE NA14
0a.29               1   13  41304745     NETAPP   X272_HJURE NA14
0a.32               20  412W4612     NETAPP   X272_HJURE NA14
0a.33               21  41355514     NETAPP   X272_HJURE NA14
0a.35               23  41332956     NETAPP   X272_HJURE NA14
0a.36               24  41356920     NETAPP   X272_HJURE NA14
0a.37               25  41356723     NETAPP   X272_HJURE NA14
0a.38               26  412K6952     NETAPP   X272_HJURE NA14
0a.39               27  412W2029     NETAPP   X272_HJURE NA14
0a.40               28  41352558     NETAPP   X272_HJURE NA14
0a.41               29  41360830     NETAPP   X272_HJURE NA14
0a.42               2   10  41360296     NETAPP   X272_HJURE NA14
0a.43               2   11  412Z6774     NETAPP   X272_HJURE NA14
0a.45               2   13  414J9147     NETAPP   X272_HJURE NA14
0a.48               30  41320062     NETAPP   X272_HJURE NA14
0a.49               31  41332516     NETAPP   X272_HJURE NA14
0a.50               32  414J8914     NETAPP   X272_HJURE NA14
0a.51               33  414J9151     NETAPP   X272_HJURE NA14
0a.52               34  41444805     NETAPP   X272_HJURE NA14
0a.53               35  3HZ74EKY00007336AHCD NETAPP   X272_SCHT6 NA16
0a.54               36  414G9462     NETAPP   X272_HJURE NA14
0a.55               37  414F4103     NETAPP   X272_HJURE NA14
0a.56               38  414G0947     NETAPP   X272_HJURE NA14
0a.57               39  414J7534     NETAPP   X272_HJURE NA14
0a.59               3   11  3HZX5JQV000073482KAF NETAPP   X272_SCHT6 NA16
0a.60               3   12  3HZ7VJLV00007502BD5H NETAPP   X272_SCHT6 NA16
0a.61               3   13  414G0265     NETAPP   X272_HJURE NA14

Looks like there 2 disks missing on Shelf 2...bay 2 and 12.  I tried to reseed the drives and they aren't showing up.

scottgelb
10,230 Views

It sounds like a hardware failure somewhere... I would check SFP and modules... make sure all paths are dual path with "storage show disk -p" ... also check that shelves are connected with fcstat device_map and/or sasadmin expander_map depending on shelf type.

AGUMADAVALLI
10,229 Views

it seems to be the disk shelf pathing and please check the power on the shelf and controller.

On the active node, check the sysconfig and disk show -v, which will provide the information of disk pathing.

thank you,

AK G

AGUMADAVALLI
10,229 Views

Download the wireguage and run it, it will show what drives or loops or hardware issues on the multipathing or looping

Thank you,

AK G

scottgelb
10,229 Views

Config Advisor (wiregauge) needs the system up and running to run the commands... haven't tried in maintenance mode though which might work but not all the commands will be available but still might report.

akhtara
10,229 Views

Paths look ok..I can see all 3 disk shelfs.  I am getting the following error alot:

Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.43 Shelf 2 Bay 11 [NETAPP   X272_HJURE073F10 NA14] S/N [412Z6774] (20885/1350427798) in plex aggr0/0, because it is out-of-date (21607).

Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.60 Shelf 3 Bay 12 [NETAPP   X272_SCHT6073F10 NA16] S/N [3HZ7VJLV00007502BD5H] (21273/1351414276) in plex aggr0/0, because it is out-of-date (21607).

Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.41 Shelf 2 Bay 9 [NETAPP   X272_HJURE073F10 NA14] S/N [41360830] (20965/1350435816) in plex aggr0/0, because it is out-of-date (21607).

Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.42 Shelf 2 Bay 10 [NETAPP   X272_HJURE073F10 NA14] S/N [41360296] (20851/1350424692) in plex aggr0/0, because it is out-of-date (21607).

scottgelb
10,229 Views

I would open a case. Could be a firmware or disk qualification file update needed.

Sent from my iPhone 4S

AGUMADAVALLI
10,229 Views

Hi there:

If you visit your netapp "my autosupport" site, you will see the disk upgrade patch, apply it. If not send the drives to netapp, and get it replaced. It is essentially a bug:

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=144216

thank you,

AK G

pmeidell
8,747 Views

You will need to give a little more information, like the background for this system. For example

1. I upgraded Data ONTAP and now it won't boot, or

2. We had a powe failure, and now it won't start, or

3. We move the system from one data center to another, and it won't boot after the move, or

4. I found a fas3020 controller in a back room, and an unused disk shelf in a storage facility, and when I connect it together, it won't boot, or

5. insert your story here.

Normally it's useful to know the system's serial number. The path forward will depend on your answers.

akhtara
8,747 Views

Hey Philip...Fair enough.

The system has been up and running yesterday and has been for quite some time.  No software or hardware changes were made.  This morning the system was down "inactive system". I powered down the head and the 3 disk shelfs.  I powered on the 3 disk shelfs and then the head and noticed its not booting or able to see the aggr0/root volume.

I noticed i had 4 failed disks, I unfailed the disks and rebooted again without any luck.

In maintenance mode, I noticed all my disks weren't assigned so I assigned them to pool0 and still no luck.

Thats how i got to where i am now.

aborzenkov
8,747 Views

This is FAS3020 which is likely to use hardware based disk ownership. Do you know for sure it was using software based disk ownership before?

akhtara
8,747 Views

I am almost 100% sure it was using hardware based disk ownership before. 

aborzenkov
8,747 Views

So why did you assign disks in this case?

You really need to open support case and wait for them to guide further steps. At this point any incorrect move can result in data loss.

akhtara
8,748 Views

So looks like i lost too many disks in my aggr0/root volume. There is volume I need in aggr1.  Anyone know of a way to rebuild/recreate aggr0/root and recover the volumes in aggr1?

scottgelb
7,518 Views

Definitely work with support on this. With multiple failed drives that may be more than just failed drives. For root you can mark another aggr as root and ontap creates a volume called AUTOROOT. From maintenance mode you can check if aggr1 is online. You could also assign to the other node but again work with support. I think any system in a failure mode with possible loss of data gets risky in a forum.

MICKINCH_PATEL
7,517 Views

How does sysconfig -r output return. Those failed drives appear as failed or orphened.

Have you open support case ? Do that first before experiments.

Public