ONTAP Discussions

Bringing up FAS2552 after holidays fails with volume offline.

dutsnekcirf
2,715 Views

We gracefully powered the Netapp off before the holidays and came back in today and powered it on.  It came up with an error about the battery being drained and needing to be charged.  I pressed "c enter" to override the wait period and force it to continue to boot.  It PANIC'd, dumped and then rebooted.  After that it appeared to get further in the bootup process but I'm still getting errors and still cannot access the storage.  See below:

 

Army_Sustainment_NetApp::> reboot
(system node reboot)

Warning: Internal error. Failed to get cluster HA information when validating
reboot / halt command.
Do you want to continue? {y|n}: y


SP-login: Terminated
.
Uptime: 34m59s
Top Shutdown Times (ms): {shutdown_raid=3747, if_reset=500, shutdown_wafl=150(multivol=0, sfsr=0, abort_scan=0, snapshot=0, hit_update=0, start=58, sync1=4, sync2=1, mark_fs=87), wafl_sync_tagged=27}
Shutdown duration (ms): {CIFS=5435, NFS=5435, ISCSI=5434, FCP=5434}
System rebooting...

Phoenix SecureCore(tm) Server
Copyright 1985-2008 Phoenix Technologies Ltd.
All Rights Reserved
BIOS version: 8.3.0
Portions Copyright (c) 2008-2014 NetApp, Inc. All Rights Reserved

CPU = 1 Processors Detected, Cores per Processor = 2
Intel(R) Xeon(R) CPU C3528 @ 1.73GHz
Testing RAM
512MB RAM tested
18432MB RAM installed
256 KB L2 Cache per Processor Core
4096K L3 Cache Detected
System BIOS shadowed
USB 2.0: MICRON eUSB DISK
BIOS is scanning PCI Option ROMs, this may take a few seconds...
...................


Boot Loader version 4.3
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2014 NetApp, Inc. All Rights Reserved.

CPU Type: Intel(R) Xeon(R) CPU C3528 @ 1.73GHz


Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_64/freebsd/image1/kernel:0x100000/7950592 0x895100/4206472 Entry at 0x80171230
Loading X86_64/freebsd/image1/platform.ko:0xc99000/1987543 0xe7f000/288800 0xec5820/272560
Starting program at 0x80171230
NetApp Data ONTAP 8.3P1
Copyright (C) 1992-2015 NetApp.
All rights reserved.
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
original max threads=40, original heap size=41943040
bip_nitro Virtual Size Limit=167074201 Bytes
bip_nitro: user memory=2029756416, actual max threads=115, actual heap size=121215385
qla_init_hw: CRBinit running ok: 8c633f
NIC FW version in flash: 5.4.9
qla_init_hw: CRBinit running ok: 8c633f
NIC FW version bundled: 5.4.51
qla_init_hw: CRBinit running ok: 8c633f
NIC FW version in flash: 5.4.9
qla_init_hw: CRBinit running ok: 8c633f
NIC FW version bundled: 5.4.51
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
Module Type 10GE Passive Copper(Compliant)[3 m]
Module Type 10GE Passive Copper(Compliant)[3 m]
Module Type 10GE Passive Copper(Compliant)[3 m]
add host 127.0.10.1: gateway 127.0.20.1
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:cf.fm.notkoverClusterDisable:warning]: Failover monitor: takeover disabled (restart)
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:LUN.nvfail.vol.proc.started:warning]: LUNs in volume lun_21092016_154842_vol (DSID 1030) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover.
Army_Sustainment_NetApp-02
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:LUN.nvfail.vol.proc.started:warning]: LUNs in volume lun_21092016_144306_vol (DSID 1028) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover.
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:LUN.nvfail.vol.proc.complete:warning]: LUNs in volume lun_21092016_154842_vol (DSID 1030) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover.
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:LUN.nvfail.vol.proc.complete:warning]: LUNs in volume lun_21092016_144306_vol (DSID 1028) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover.
Jan 02 10:55:46 [Army_Sustainment_NetApp-02:kern.syslog.msg:notice]: The system was down for 145 seconds
Jan 02 10:55:47 [Army_Sustainment_NetApp-02:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of Army_Sustainment_NetApp-01 disabled (Controller Failover takeover disabled).
Jan 02 10:55:47 [Army_Sustainment_NetApp-02:snmp.agent.msg.access.denied:warning]: Permission denied for SNMPv3 requests from root. Reason: Password is too short (SNMPv3 requires at least 8 characters).
Jan 02 10:55:47 [Army_Sustainment_NetApp-02:clam.invalid.config:warning]: Local node (name=unknown, id=0) is in an invalid configuration for providing CLAM functionality. CLAM cannot determine the identity of the HA partner.
Ipspace "acp-ipspace" created
Jan 02 10:55:52 [Army_Sustainment_NetApp-02:cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not responding
Jan 02 10:56:00 [Army_Sustainment_NetApp-02:monitor.globalStatus.critical:CRITICAL]: Controller failover of Army_Sustainment_NetApp-01 is not possible: Controller Failover takeover disabled.
Jan 02 10:56:01 [Army_Sustainment_NetApp-02:ha.takeoverImpNotDef:error]: Takeover of the partner node is impossible due to reason Controller Failover takeover disabled.
Jan 02 10:57:25 [Army_Sustainment_NetApp-02:mgmtgwd.rootvol.recovery.changed:EMERGENCY]: The contents of the root volume might have changed and the local management databases might be out of sync with the replicated databases. This node is not fully operational. Contact technical support to obtain the root volume recovery procedures.
Jan 02 10:57:25 [Army_Sustainment_NetApp-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED.

Tue Jan 2 10:57:26 MST 2018
login: SP-login: admin
Password:

 

I'm concerned about the statement: "LUNs in volume lun_21092016_154842_vol (DSID 1030) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover."  I suspect that's why I still can't access storage.

 

Also, I get the following message when I login:  "The contents of the root volume may have changed and the local management configuration may be inconsistent and/or the local management databases may be out of sync with the replicated databases. This node is not fully operational. Contact support personnel for the root volume recovery procedures."

 

I don't see anything on the interwebs about root volume recovery procedures.

 

Any help would be appreciated.

1 REPLY 1

dutsnekcirf
2,687 Views

Netapp support came and saved the day for us.  I couldn't possibly document the steps we took to recover. 

Public