ONTAP Hardware
ONTAP Hardware
All,
I have a FAS3020 that isn't booting.
Here is what I am getting...any ideas to get this filer to boot?
NetApp Release 7.3.4: Thu May 27 17:52:48 PDT 2010
Copyright (c) 1992-2010 NetApp.
Starting boot on Fri Jul 13 18:34:03 GMT 2012
Fri Jul 13 18:34:46 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0b.
Fri Jul 13 18:34:54 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
add net 127.0.0.Fr0: gateway 127.0i Jul 13 18:34:57 GMT [config.noPartnerDisks:CRITICAL]: No disks were detected for the partner; this node will be unable to takeover correctly
.0.1
Fri Jul 13 18:34:57 GMT [fmmb.current.lock.disk:info]: Disk 0a.18 is a local HA mailbox disk.
Fri Jul 13 18:34:57 GMT [fmmb.current.lock.disk:info]: Disk 0a.32 is a local HA mailbox disk.
Fri Jul 13 18:34:57 GMT [fmmb.instStat.change:info]: normal mailbox instance on local side.
Fri Jul 13 18:34:58 GMT [fmmb.instStat.change:info]: no mailbox instance on partner side.
Fri Jul 13 18:34:58 GMT [monitor.chassisPower.degraded:notice]: Chassis power is degraded:
Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.60 Shelf ? Bay ? [NETAPP X272_SCHT6073F10 NA16] S/N [3HZ7VJLV00007502BD5H] (21273/1351414276) in plex aggr0/0, because it is out-of-date (21607).
Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.42 Shelf ? Bay ? [NETAPP X272_HJURE073F10 NA14] S/N [41360296] (20851/1350424692) in plex aggr0/0, because it is out-of-date (21607).
Fri Jul 13 18:34:58 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.41 Shelf ? Bay ? [NETAPP X272_HJURE073F10 NA14] S/N [41360830] (20965/1350435816) in plex aggr0/0, because it is out-of-date (21607).
Root volume is failed (all disks may not yet be visible).
Rebooting... (ctrl-c to break reboot loop)
It shows you are missing drives in the root aggregate (and likely others). I would boot to maintenance mode and check disk assignment, all disk paths are available... not sure if anything changed or if some hardware was swapped but something changed or disks or a shelf are not viewable or assigned.
Nothing has changed, was up yesterday and this morning "system inactive"..I rebooted it and its stuck in this loop.
storage show disk displays:
DISK | SHELF BAY SERIAL | VENDOR MODEL | REV |
--------------------- --------- ---------------- -------- ---------- ----
0a.16 | 1 | 0 41317181 | NETAPP X272_HJURE NA14 |
0a.17 | 1 | 1 41316538 | NETAPP X272_HJURE NA14 |
0a.18 | 1 | 2 41327496 | NETAPP X272_HJURE NA14 |
0a.19 | 1 | 3 412W4548 | NETAPP X272_HJURE NA14 |
0a.20 | 1 | 4 3HZ7SWGJ000073369BCP NETAPP X272_SCHT6 NA16 | |
0a.21 | 1 | 5 41360845 | NETAPP X272_HJURE NA14 |
0a.22 | 1 | 6 41320011 | NETAPP X272_HJURE NA14 |
0a.23 | 1 | 7 41316772 | NETAPP X272_HJURE NA14 |
0a.24 | 1 | 8 41316694 | NETAPP X272_HJURE NA14 |
0a.25 | 1 | 9 412W4347 | NETAPP X272_HJURE NA14 |
0a.26 | 1 10 412Y8249 | NETAPP X272_HJURE NA14 | |
0a.27 | 1 11 41324284 | NETAPP X272_HJURE NA14 | |
0a.28 | 1 12 41304561 | NETAPP X272_HJURE NA14 | |
0a.29 | 1 13 41304745 | NETAPP X272_HJURE NA14 | |
0a.32 | 2 | 0 412W4612 | NETAPP X272_HJURE NA14 |
0a.33 | 2 | 1 41355514 | NETAPP X272_HJURE NA14 |
0a.35 | 2 | 3 41332956 | NETAPP X272_HJURE NA14 |
0a.36 | 2 | 4 41356920 | NETAPP X272_HJURE NA14 |
0a.37 | 2 | 5 41356723 | NETAPP X272_HJURE NA14 |
0a.38 | 2 | 6 412K6952 | NETAPP X272_HJURE NA14 |
0a.39 | 2 | 7 412W2029 | NETAPP X272_HJURE NA14 |
0a.40 | 2 | 8 41352558 | NETAPP X272_HJURE NA14 |
0a.41 | 2 | 9 41360830 | NETAPP X272_HJURE NA14 |
0a.42 | 2 10 41360296 | NETAPP X272_HJURE NA14 | |
0a.43 | 2 11 412Z6774 | NETAPP X272_HJURE NA14 | |
0a.45 | 2 13 414J9147 | NETAPP X272_HJURE NA14 | |
0a.48 | 3 | 0 41320062 | NETAPP X272_HJURE NA14 |
0a.49 | 3 | 1 41332516 | NETAPP X272_HJURE NA14 |
0a.50 | 3 | 2 414J8914 | NETAPP X272_HJURE NA14 |
0a.51 | 3 | 3 414J9151 | NETAPP X272_HJURE NA14 |
0a.52 | 3 | 4 41444805 | NETAPP X272_HJURE NA14 |
0a.53 | 3 | 5 3HZ74EKY00007336AHCD NETAPP X272_SCHT6 NA16 | |
0a.54 | 3 | 6 414G9462 | NETAPP X272_HJURE NA14 |
0a.55 | 3 | 7 414F4103 | NETAPP X272_HJURE NA14 |
0a.56 | 3 | 8 414G0947 | NETAPP X272_HJURE NA14 |
0a.57 | 3 | 9 414J7534 | NETAPP X272_HJURE NA14 |
0a.59 | 3 11 3HZX5JQV000073482KAF NETAPP X272_SCHT6 NA16 | ||
0a.60 | 3 12 3HZ7VJLV00007502BD5H NETAPP X272_SCHT6 NA16 | ||
0a.61 | 3 13 414G0265 | NETAPP X272_HJURE NA14 |
Looks like there 2 disks missing on Shelf 2...bay 2 and 12. I tried to reseed the drives and they aren't showing up.
It sounds like a hardware failure somewhere... I would check SFP and modules... make sure all paths are dual path with "storage show disk -p" ... also check that shelves are connected with fcstat device_map and/or sasadmin expander_map depending on shelf type.
it seems to be the disk shelf pathing and please check the power on the shelf and controller.
On the active node, check the sysconfig and disk show -v, which will provide the information of disk pathing.
thank you,
AK G
Download the wireguage and run it, it will show what drives or loops or hardware issues on the multipathing or looping
Thank you,
AK G
Config Advisor (wiregauge) needs the system up and running to run the commands... haven't tried in maintenance mode though which might work but not all the commands will be available but still might report.
Paths look ok..I can see all 3 disk shelfs. I am getting the following error alot:
Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.43 Shelf 2 Bay 11 [NETAPP X272_HJURE073F10 NA14] S/N [412Z6774] (20885/1350427798) in plex aggr0/0, because it is out-of-date (21607).
Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.60 Shelf 3 Bay 12 [NETAPP X272_SCHT6073F10 NA16] S/N [3HZ7VJLV00007502BD5H] (21273/1351414276) in plex aggr0/0, because it is out-of-date (21607).
Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.41 Shelf 2 Bay 9 [NETAPP X272_HJURE073F10 NA14] S/N [41360830] (20965/1350435816) in plex aggr0/0, because it is out-of-date (21607).
Fri Jul 13 20:08:35 GMT [raid.assim.cls.outOfDate:error]: Orphaning Disk 0a.42 Shelf 2 Bay 10 [NETAPP X272_HJURE073F10 NA14] S/N [41360296] (20851/1350424692) in plex aggr0/0, because it is out-of-date (21607).
I would open a case. Could be a firmware or disk qualification file update needed.
Sent from my iPhone 4S
Hi there:
If you visit your netapp "my autosupport" site, you will see the disk upgrade patch, apply it. If not send the drives to netapp, and get it replaced. It is essentially a bug:
http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=144216
thank you,
AK G
You will need to give a little more information, like the background for this system. For example
1. I upgraded Data ONTAP and now it won't boot, or
2. We had a powe failure, and now it won't start, or
3. We move the system from one data center to another, and it won't boot after the move, or
4. I found a fas3020 controller in a back room, and an unused disk shelf in a storage facility, and when I connect it together, it won't boot, or
5. insert your story here.
Normally it's useful to know the system's serial number. The path forward will depend on your answers.
Hey Philip...Fair enough.
The system has been up and running yesterday and has been for quite some time. No software or hardware changes were made. This morning the system was down "inactive system". I powered down the head and the 3 disk shelfs. I powered on the 3 disk shelfs and then the head and noticed its not booting or able to see the aggr0/root volume.
I noticed i had 4 failed disks, I unfailed the disks and rebooted again without any luck.
In maintenance mode, I noticed all my disks weren't assigned so I assigned them to pool0 and still no luck.
Thats how i got to where i am now.
This is FAS3020 which is likely to use hardware based disk ownership. Do you know for sure it was using software based disk ownership before?
I am almost 100% sure it was using hardware based disk ownership before.
So why did you assign disks in this case?
You really need to open support case and wait for them to guide further steps. At this point any incorrect move can result in data loss.
So looks like i lost too many disks in my aggr0/root volume. There is volume I need in aggr1. Anyone know of a way to rebuild/recreate aggr0/root and recover the volumes in aggr1?
Definitely work with support on this. With multiple failed drives that may be more than just failed drives. For root you can mark another aggr as root and ontap creates a volume called AUTOROOT. From maintenance mode you can check if aggr1 is online. You could also assign to the other node but again work with support. I think any system in a failure mode with possible loss of data gets risky in a forum.
How does sysconfig -r output return. Those failed drives appear as failed or orphened.
Have you open support case ? Do that first before experiments.