ONTAP Hardware

FAS2240-2 controller-A fails to boot.

Arukado
9,021 Views

Hello,

 

after power shortage one of controller on my FAS2240-2 won't boot.

It looks like this:

 

LOADER-A> boot_backup (tried autoboot / boot_ontap / boot primary / boot_diag)
Loading X86_64/freebsd/image1/kernel:0x100000/8454976 0x910340/1278344 Entry at 0x80158990
Loading X86_64/freebsd/image1/platform.ko:0xa49000/655152 0xb97c40/694752 0xae8f40/39656 0xc41620/43152 0xaf2a28/86316 0xb07b54/63858 0xb174e0/140640 0xc4beb0/159120 0xb39a40/2024 0xc72c40/6072 0xb3a228/304 0xc743f8/912 0xb3a358/1680 0xc74788/5040 0xb3a9e8/960 0xc75b38/2880 0xb3ada8/184 0xc76678/552 0xb3ae60/448 0xb6f000/12918 0xb97b53/237 0xb72278/84120 0xb86b10/69699
Starting program at 0x80158990
NetApp Data ONTAP 8.1.4 7-Mode

PANIC : The 0 is not a supported platform

version: 8.1.4: Wed Nov 20 16:16:17 PST 2013
conf : x86_64
cpuid = 0
Uptime: 1s

The operating system has halted.
Please press any key to reboot.

System halting...
cpu_reset called on cpu#0

Phoenix SecureCore(tm) Server
Copyright 1985-2008 Phoenix Technologies Ltd.
All Rights Reserved
BIOS version: 8.3.0
Portions Copyright (c) 2008-2014 NetApp, Inc. All Rights Reserved

CPU = 1 Processors Detected, Cores per Processor = 2
Intel(R) Xeon(R) CPU C3528 @ 1.73GHz
Testing RAM
512MB RAM tested
6144MB RAM installed
256 KB L2 Cache per Processor Core
4096K L3 Cache Detected
System BIOS shadowed
USB 2.0: MICRON eUSB DISK
BIOS is scanning PCI Option ROMs, this may take a few seconds...
WARNING
02A1: SP Not Found
ERROR
No Response to Controller FRU ID Read Request via IPMI
ERROR
No Response to Midplane FRU ID Read Request via IPMI


Boot Loader version 4.3
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2014 NetApp, Inc. All Rights Reserved.

CPU Type: Intel(R) Xeon(R) CPU C3528 @ 1.73GHz
BIOS POST Failure(s) detected. Abort AUTOBOOT
LOADER-A>

 

02A1: SP Not Found is worrying and initially I thought that it's faulty but it's working when I connect to via ssh and using ctrl+g on console.

Also from loader-a I can check it's status:

 

LOADER-A> sp status
Firmware Version: 2.2.3
Ethernet Link: up, 100 Mb, full duplex, auto-neg complete
Mgmt MAC Address: 00:A0:98:3F:D5:2E
IPv4 Settings
Using DHCP: NO
IP Address: 192.168.100.10
Netmask: 255.255.255.0
Gateway: 192.168.100.1
IPv6: Disabled
LOADER-A>

 

I've powered down whole chassis, unplugged module and plugged again but that didn't help.

After complete reboot SP and array lost current time and I was able to set time via LOADER-A and SP got it instantly so there's communication.

 

Show devices command:

LOADER-A*> show devices
Device Name Description
----------- ---------------------------------------------------------
sp0a Service Processor: Console at 0x3F8, PSI at 0x2F8
clock0a ISA RTC at 0x70 (index) and 0x71 (target)
kcs0a KCS at 0xca3 (command) and 0xca2 (data)
u0a.0 MICRON eUSB DISK-(USB 2.0)
boot0 u0a.0 alias boot device
boot_i u0a.0 alias boot device
e0M IBA GE Slot 0500 v1353 (00-A0-98-3F-D5-2D)
e0P IBA GE Slot 0700 v1353 (00-A0-98-3F-D5-2C)
e0a IBA GE Slot 0402 v1353 (00-A0-98-3F-D5-28)
e0b IBA GE Slot 0403 v1353 (00-A0-98-3F-D5-29)
e0c IBA GE Slot 0400 v1353 (00-A0-98-3F-D5-2A)
e0d IBA GE Slot 0401 v1353 (00-A0-98-3F-D5-2B)

 

I've tried sp reset from loader but that didn't worked but resetting from SP via sp reboot command worked.

 

 

From SP commandline:

 

SP netapp1> system fru list

FRU ID Name
==============================================
IPMI session creation failed - err(0x0001)

 

SP netapp1> system sensors
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
Error: Unable to establish LAN session
Get Device ID command failed
Unable to open SDR for reading

 

Dunno what else to check but I have access to LOADER and SP so if someone can guide me then I can check other things but I must admit it turns out that my knowledge about netapp is pretty limited so I'm seeking help form more experienced users here. Any idea would be appreciated.

Regards.

 

14 REPLIES 14

pedro_rocha
9,012 Views

Hello,

 

Why are you using boot_backup instead of boot_ontap?

 

Regards,

Pedro

pedro_rocha
9,009 Views

From the following NetApp KB

 https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/FAS22xx_fail_to_boot_with_error_%22The_0_is_not_a_supported_platform...

 

Solutions:

  1. Reseat the motherboard of affected node.
  2. Login SP of affected node, perform command “sp reboot”, then check SP status by command "sp status".
  3. Make sure SP status is normal, then try to boot ONTAP by command "boot_ontap".
  4. If the above steps do not resolve the issue, contact NetApp Support for further analysis.

Arukado
8,997 Views

Yeah I've tried that before I posted any question here.

It's old archival system and we don't have support active since 2019 that's why I'm asking here counting that someone kind

will help.

 

Like I mentioned I've tried to boot all available options and non of them worked.

 

When machine is starting form scratch (powered down completely) it's not giving me this "PANIC : The 0 is not a supported platform" only SP not found error.

 

BIOS is scanning PCI Option ROMs, this may take a few seconds...
WARNING
02A1: SP Not Found
ERROR
No Response to Controller FRU ID Read Request via IPMI
ERROR
No Response to Midplane FRU ID Read Request via IPMI

 

Also since second controller is working is there a way to somehow force it to serve volumes which were on broken one?

Right now it's saying netapp2(takeover) but I can't see volumes from broken one on it and I can login via netapp web based tool cos it's complaining that netapp1 isn't responding.

 

pedro_rocha
8,994 Views

Hi,

 

Ok then. What is the output for the following (run on the online controller)

 

cf status

cf monitor

partner

 

Regards,

Pedro

Arukado
8,992 Views

It looks like that:

 

netapp2(takeover)> cf status
netapp2 has taken over netapp1.

 

netapp2(takeover)> cf monitor
current time: 06May2021 23:32:17
TAKEOVER 07:36:09, partner 'netapp1', CF monitor enabled

 

Login to partner shell: netapp1
netapp1/netapp2> Thu May 6 23:32:46 CEST [netapp2:cf.partner.login:info]: Login to partner shell: netapp1

 

 

 

 

pedro_rocha
8,988 Views

As you can see, takeover has already happen.

 

When you type partner, you get into the partner shell.

 

Possibly you have a network issue and that is why you cannot access volumes. Verify if network interfaces are up, running and reachable

 

> partner

> ifconfig -a

Arukado
8,984 Views

Ah ok I can see volumes now but like you said there's something wrong with network. All network cards are missing.

netapp1/netapp2> ifconfig -a
lo: flags=0x1be8049<UP,LOOPBACK,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM> mtu 9188
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
takeover mode (lo)

That's how this looks like on working controller:

 

netapp2(takeover)> ifconfig -a
e0a: flags=0x170c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:3c:4f:20 (auto-unknown-down) flowcontrol full
e0b: flags=0x170c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:3c:4f:21 (auto-unknown-down) flowcontrol full
e0c: flags=0x170c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:3c:4f:22 (auto-unknown-down) flowcontrol full
e0d: flags=0x170c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:3c:4f:23 (auto-unknown-down) flowcontrol full
e0M: flags=0x2b4c867<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM,MGMT_PORT> mtu 1500
inet 192.168.100.20 netmask 0xffffff00 broadcast 172.18.3.255 noddns
ether 00:a0:98:3c:4f:25 (auto-100tx-fd-up) flowcontrol full
e0P: flags=0x2b4c867<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM,ACP_PORT> mtu 1500 PRIVATE
inet 192.168.1.32 netmask 0xfffffc00 broadcast 192.168.3.255 noddns
ether 00:a0:98:3c:4f:24 (auto-100tx-fd-up) flowcontrol full
lo: flags=0x1be8049<UP,LOOPBACK,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM> mtu 9188
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
losk: flags=0x40a400c9<UP,LOOPBACK,RUNNING> mtu 9188
inet 127.0.20.1 netmask 0xff000000 broadcast 127.0.20.1
netapp2(takeover)>

 

And the other question since those volumes were used as boot volumes for my Dell R715 machines via fiber should I change them in boot options?

 

pedro_rocha
8,983 Views
Send the output for

rdfile /etc/rc

while on the parter controller (the one that’s impaired )

Regards
Pedro

Sent from my iPhone

Arukado
8,981 Views

After switching to partner and typing this command i get this:

 

netapp1/netapp2> rdfile /etc/rc

#Auto-generated by setup Tue Jun 4 07:29:23 GMTme netapp1
ifconfig e0M `hostname`-e0M netmask 255.255.255.0 partner 192.168.100.20 mtusize 1500
route add default 192.168.100.1 1
routed on
options dns.domainname arukado.int
options dns.enable on
options nis.enable off
savecore

 

But that's IP from working controller.

pedro_rocha
8,974 Views
Wait...

You use fcp as the protocol for data access?

Sent from my iPhone

Arukado
8,969 Views

Yes all Dell machines are connected via fiber.

pedro_rocha
8,958 Views
In that case Ethernet network is not a big problem

You got to check your SAN.

Also your hosts must have multipath in place to find the LUNs through the partner controller (online one).

Is there actually any issue with your LUNs access?

Regarding the Ethernet network you won’t see the adapters as it was before. If you are missing any IP from the impaired controller, you must the it manually via ifconfig command and add it to the /etc/rc

I am a bit confuse now. Your first issue was the controller not booting. If you followed the KB, now you would need NetApp assistance.

Besides that what else is not working as you need?

Regards
Pedro



Sent from my iPhone

Arukado
8,919 Views

Yup it's not booting and that's major issue but like I said it's archival infrastructure so if I'll be able to start it with one controller only that will be fine for now.

Of course it would be nice to fix that even from knowledge gaining perspective but in most similar post the solution is to either contact support and/or replace the hardware which in my case is impossible cos no one will pay for that.

 

Regarding LUN access there's was an issue. My Dell machines didn't boot but I've managed to fix WWPN and lun ID in emulex cards and bring up whole environment.

 

 

 

 

 

 

AlexDawson
8,938 Views

Hi there,

 

If you swap which controller is in which slot, does the same slot fail to boot the other controller and it works ok in the other one?

 

If that is the case, replace the chassis - while not supported to do this in production, you can use any DS2246 chassis.

 

If the error moves with the controller, first open it up and remove the coin cell battery and then reinstall it, but if the error persists, replace the controller - they're about $300-400 on ebay, and with 8.1, the licenses aren't serial number locked - you will need to reassign disks from maintenance mode and destroy mailboxes and reestablish HA, but that should be it.

Public