Network and Storage Protocols

FAS2040 stuck in loop, won't boot

JOEHIRTH11
19,975 Views

Hello,

I'm very new to NetApp and have very limited knowledge, so go easy.

Basically I am trying to re-IP the second head of our FAS2040. The first head ran setup fine, however head two gets stuck in some kind of loop that I can't seem to break.

I've been looking around and I can see that there is a command to factory reset the filer but I can't get to the CLI to be able to do this. (I am using a console cable).

I've also tried doing things from the partner by using cf takeover -f but certain commands cannot be invoked in takeover

I do get some kind of error saying that it can't write to /etc/hosts.new

Anyone have any ideas??

Thanks Joe

12 REPLIES 12

scottgelb
19,928 Views

Likely no disks assigned. All disks may be assigned to the other controller. Or you have unassigned disks. From maintenance mode check disk show -n for innards dined and assign them. Or you may need to unassign spare from the partner node first.

JOEHIRTH11
19,928 Views

Hi Scott,

Thanks for your response, I did a disk show -n and it reports back no unassigned disks.

Disk show however shows that my second head owns 4 of the 12 disks. Before I tried accessing this filer I'm pretty sure it was set up with 8 + 4 disks assigned to retrospective heads, one spare for each.

Any further input?

aborzenkov
19,928 Views

Please copy and paste exact console output after filer power on and until next reboot.

JOEHIRTH11
19,928 Views

Fairly long but this is the output I get:

Loading X86/freebsd/image1/kernel:..0x200000/3325948 0x52d000/3131512 0x829878/434316 0x894000/675840 Entr        y at 0x0023b8f0
Loading X86/freebsd/image1/platform.ko:0x939000/410584 0x99e3d8/30564 0x9a5b3c/111700
Starting program at 0x0023b8f0
NetApp Data ONTAP 8.0.2 7-Mode
Copyright (C) 1992-2011 NetApp.
All rights reserved.
*******************************
*                             *
* Press Ctrl-C for Boot Menu. *
*                             *
*******************************


(mtl_log): Layer=MTL_MODULE Severity=1: XHH_hob_open_hca: open hca for InfiniHost_III_Lx0 succeeded
Fri Jun 29 15:29:00 GMT [cf.nm.nicTransitionUp:info]: Interconnect link 0 is UP
Fri Jun 29 15:29:07 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0b.
Fri Jun 29 15:29:07 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0a.
Fri Jun 29 15:29:10 GMT [sas.initialization.failed:error]: Initialization failed on SAS adapter 0d.
nvmem2_dblade_initialization: nv_base=0x96800000, nv_size=0MB
Fri Jun 29 15:29:13 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
ReservatiFri Jun 29 15:29:13 GMT [ses.giveback.wait:info]: Enclosure Services will be unavailable while waiting for giveback.
on conflict founarp_rtrequest: bad gateway 127.0.20.1 (!AF_LINK)
d on this node's disks!
Local System ID: 135111367
Press Ctrl-C for Maintenance menu to release disks.
add host 127.0.10.1: gateway 127.0.20.1
Fri Jun 29 15:29:16 GMT [netif.linkUp:info]: Ethernet e0a: Link up.
Disk rFri Jun 29 15:29:17 GMT [ses.giveback.restartAfter:info]: Enclosure Services restarting after release of reservations.
Fri Jun 29 15:29:17 GMT [cf.nm.nicReset:warning]: Initiating soft reset on Cluster Interconnect card 0 due to rendezvous reset
eservations haveFri Jun 29 15:29:17 GMT [cf.rv.notConnected:error]: Connection for 'cfo_rv' failed.
been released
Fri Jun 29 15:29:18 GMT [fmmb.current.lock.disk:info]: Disk 0c.00.0 is a local HA mailbox disk.
Fri Jun 29 15:29:18 GMT [fmmb.current.lock.disk:info]: Disk 0c.00.2 is a local HA mailbox disk.
Fri Jun 29 15:29:18 GMT [fmmb.instStat.change:info]: normal mailbox instance on local side.
Fri Jun 29 15:29:18 GMT [fmmb.current.lock.disk:info]: Disk 0c.00.1 is a partner HA mailbox disk.
Fri Jun 29 15:29:18 GMT [fmmb.current.lock.disk:info]: Disk 0c.00.3 is a partner HA mailbox disk.
Fri Jun 29 15:29:18 GMT [fmmb.instStat.change:info]: normal mailbox instance on partner side.
Fri Jun 29 15:29:18 GMT [netif.linkDown:info]: Ethernet e0P: Link down, check cable.
Fri Jun 29 15:29:18 GMT [cf.fm.partner:info]: Failover monitor: partner 'ddstnpp001'
Fri Jun 29 15:29:18 GMT [cf.fm.timeMasterStatus:info]: Acting as time slave
Fri Jun 29 15:29:19 GMT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Fri Jun 29 15:29:19 GMT [netif.linkDown:info]: Ethernet e0b: Link down, check cable.
Fri Jun 29 15:29:19 GMT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Fri Jun 29 15:29:20 GMT [shelf.config.spha:info]: System is using single path HA attached storage only.
Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
Fri Jun 29 15:30:04 GMT [coredump.spare.none:info]: No sparecore disk was found.
Fri Jun 29 15:30:04 GMT [raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Fri Jun 29 15:30:04 GMT [raid.stripe.replay.summary:info]: Replayed 0 stripes.
Fri Jun 29 15:30:05 GMT [localhost: cf.fm.launch:info]: Launching failover monitor
Fri Jun 29 15:30:05 GMT [localhost: cf.fm.partner:info]: Failover monitor: partner 'ddstnpp001'
Fri Jun 29 15:30:05 GMT [localhost: cf.fm.discardNvram:notice]: Failover monitor: node was previously taken over, nvram may be discarded
Fri Jun 29 15:30:05 GMT [localhost: fcp.service.adapter:warning]: FCP started but no adapter is present!
cannot open /etc/syslog.conf
Adding config files for console for 0
Fri Jun 29 15:30:05 GMT [localhost: wafl.vol.full:notice]: file system on volume vol0 is full
filter sync'd
add net 127.0.0.0: gateway 127.0.0.1
Fri Jun 29 15:30:06 GMT [localhost: cf.nm.nicReset:warning]: Initiating soft reset on Cluster Interconnect card 0 due to rendezvous reset

Fri Jun 29 15:30:07 UTC 2012
RDB-HA ending primary
syslogd: Cannot write to file /etc/messages: No space left on device
Fri Jun 29 15:30:30 GMT [syslogd_print]: message that couldn't be logged was:
Fri Jun 29 15:30:06 GMT [localhost: cf.nm.nicReset:warning]: Initiating soft reset on Cluster Interconnect card 0 due to rendezvous reset
Fri Jun 29 15:30:06 GMT [localhost: rc:notice]: The system was down for 11722 seconds
Fri Jun 29 15:30:06 GMT [localhost: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of ddstnpp001 disabled (interconnect error).
Fri Jun 29 15:30:06 GMT [localhost: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of  by ddstnpp001 disabled (interconnect error).
Fri Jun 29 15:30:22 GMT [localhost: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of ddstnpp001 enabled
Fri Jun 29 15:30:25 GMT [localhost: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of  by ddstnpp001 enabled
Fri Jun 29 15:30:30 GMT [localhost: tar.csum.match:info]: Stored checksum matches, not extracting local://tmp/prestage/mroot.tgz.
Fri Jun 29 15:30:30 GMT [localhost: tar.csum.match:info]: Stored checksum matches, not extracting local://tmp/prestage/pmroot.tgz.
Fri Jun 29 15:30:30 GMT [ddstndsr021: reg.file.createFail:warning]: registry: Cannot create /etc/registry.local.0 file. No space left on device.
Fri Jun 29 15:30:30 GMT [ddstndsr021: rc:error]: Unable to copy /etc/registry to /etc/registry.lastgood.   You should make a manual backup of /etc/registry.
Fri Jun 29 15:30:30 GMT [ddstndsr021: reg.file.writeFail:error]: registry: Cannot write to /etc/registry.lastgood file. No space left on device.
Fri Jun 29 15:30:31 GMT [ddstndsr021: snapmirror.log.writeErr:error]: Failed to write to SnapMirror log file: No space left on device.
hostname: error in setting registry
ifgrp: command "vif" is deprecated in favor of command "ifgrp"
Fri Jun 29 15:30:31 GMT [ddstndsr021: rc:error]: registry: regx_commit() failed. errors = Error: Registry persistence error (name=file) (value=/etc/registry.local) Error: Registry persistence error (name=file) (value=/etc/registry)
Fri Jun 29 15:30:31 GMT [ddstndsr021: perf.archive.file.close.fail:warning]: Performance archiver failed to close file: /etc/log/stats/archive/.preset/default. (11488)
Fri Jun 29 15:30:31 GMT [ddstndsr021: dfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk drives
Fri Jun 29 15:30:31 GMT [ddstndsr021: perf.archive.registry.fail:warning]: The performance archiver could not modify registry: 'options.stats.archive.max_disk_space_percent' to '2'.
Fri Jun 29 15:30:37 GMT [ddstndsr021: perf.archive.file.close.fail:warning]: Performance archiver failed to close file: /etc/log/stats/archive/.preset/default. (10197)
Fri Jun 29 15:30:37 GMT [ddstndsr021: perf.archive.preset.load.fail:warning]: Performance archiver failed to load preset level 'default'. (10257)
Fri Jun 29 15:30:37 GMT [ddstndsr021: perf.archive.start.fail:warning]: Performance archiver failed to start: unable to load preset.
add net default: gateway 10.9.1.1
Fri Jun 29 15:30:4Failed to write /etc files. Changes may not be persistent.
0 GMT [ddstndsr021: rc:warning]: registry: Unable to update options.dns.domainname in /etc/registry.local
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:warning]: registry: Unable to update options.dns.domainname in /etc/registry
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:warning]: registry: Unable to update options.dns.enable in /etc/registry.local
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:warning]: registry: Unable to update options.dns.enable in /etc/registry
Fri Jun 29 15:30:40 GMT [ddstndsr021: iscsi.service.startup:info]: iSCSI service startup
Fri Jun 29 15:30:40 GMT [ddstndsr021: iscsi.warning:warning]: ISCSI: Unable to update iSCSI tpgroup config file
Fri Jun 29 15:30:40 GMT last message repeated 2 times
Fri Jun 29 15:30:40 GMT [ddstndsr021: reg.file.updateFail:warning]: registry: Cannot update /etc/registry.local file.
Fri Jun 29 15:30:40 GMT [ddstndsr021: reg.file.updateFail:warning]: registry: Cannot update /etc/registry.local file.
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:error]: read_config_store open failure on /etc/snmppersist.conf
Fri Jun 29 15:30:40 GMT last message repeated 2 times
Fri Jun 29 15:30:40 GMT [ddstndsr021: perf.archive.file.close.fail:warning]: Performance archiver failed to close file: /etc/log/stats/archive/.preset/default. (10197)
option stats.archive.enable: unable to start archiver.
Fri Jun 29 15:30:40 GMT [ddstndsr021: perf.archive.preset.load.fail:warning]: Performance archiver failed to load pres

Data ONTAP (ddstndsr021.ppen.local)
login: ait)Continuing boot...
Password:
Login incorrect

Data ONTAP (ddstndsr021.ppen.local)
login: Fri Jun 29 15:30:04 GMT [raid.cksu
Data ONTAP (ddstndsr021.ppen.csc-dynamics.locaet level 'defaull)
login: m.replay.summary:info]: Replayed 0
Data ONTAP (ddstndsr021.ppen.local)
login:  checksum blocks.
Password:
Login incorrect

Data ONTAP (ddstndsr021.local)
login: Fri Jun 29 15:30:05 GMT [=l=ocal>hsost
smicta lONfT name 0c shAPe (lddsftnd_sr02i1.dppe n.
.loca=l=)
l>ogisn: e: cnf.fsm.olaunrch :infco]o: Luanuncthin g f1a
NTa ta
  AP= (=d>d0stn:ds r0t21.ypppene.l(oc1al))
lorgien:a idloivenr gmon(it1or)
ePassswtorad:t
tc og(in1 i)nco rr

==>1: type(2) id(2) reading(1) state (1)
==>2: type(3) id(1) reading(5460) state (1)
==>3: type(3) id(2) reading(5450) state (1)
==>4: type(4) id(1) reading(32) state (1)
==>5: type(4) id(2) reading(28) state (1)
==>6: type(4) id(3) reading(30) state (1)
==>7: type(4) id(4) reading(32) state (1)
==>8: type(18) id(1) reading(12210) state (1)
==>9: type(18) id(2) reading(5110) state (1)
==>10: type(18) id(3) reading(3550) state (1)
==>11: type(18) id(4) reading(12160) state (1)
==>12: type(18) id(5) reading(5090) state (1)
==>13: type(18) id(6) reading(3530) state (1)
t'. (10257)(19) id(1) reading(6900) state (1)
F
==>15: type(19) id(2) reading(2020) state (1)
==>16: type(19) id(3) reading(6790) state (1)
==>17: type(19) id(4) reading(1830) state (1)
ses_dblade_init success
ri Jun 29 15:30:40 GMT [ddstndsr021: perf.archive.start.fail:warning]: Performance archiver failed to start: unable to load preset.
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:warning]: registry: Unexpected error in rollback in regx_commit - Error: Registry key does not exist (name=options.coredump.metadata_only) Error: Registry key does not exist (name=options.rmc.lan.smtp_ip) Error: Registry key does not exist (name=options.rmc.setup) Error: Registry key does not exist (name=options.bootfs.
Fri Jun 29 15:30:40 GMT [ddstndsr021: reg.transaction.commitFail:warning]: registry: Cannot commit transaction in 'Periodic config update'. Error: Registry persistence error (name=file) (value=/etc/registry)
Unable to open /etc/configs/tmp.685707507.new file in write mode: (No space left on device).
Fri Jun 29 15:30:40 GMT [ddstndsr021: rc:error]: registry_config_audit: Error occurred while taking the backup of the filer's current configuration: (Input/output error).
sysconfig: Unless directed by NetApp Global Services volume vol0 should have the volume option create_ucode set to On.
Fri Jun 29 15:30:40 GMT [ddstndsr021: cmds.sysconf.logErr:error]: sysconfig: Unless directed by NetApp Global Services volume vol0 should have the volume option create_ucode set to On. .
Fr
i Jun 29 15:30:4
Data ONTAP (ddstndsr021.ppen.local)
login: Fri Jun 29 15:30:05 GMT [localhost
Data ONTAP (ddstndsr021.ppen.local)
login: : cf.fm.discardNvram:notice]: Fail
Data ONTAP (ddstndsr021.ppen.local)
login: over monit0 GMT [ddstndsr0or: node was previously
Data ONTAP (ddstndsr021.ppen.local)
login: taken over, nvram may be discarded
Data ONTAP (ddstndsr021.ppen.local)
login:
Data ONTAP (ddstndsr021.ppen.local)
login: Fri Jun 29 15:30:05 21: callhome.sysGMT [localhost
Data ONTAP (ddstndsr021.ppen.local)
login: : fcp.service.adapter:warning]: FC
Data ONTAP (ddstndsr021.ppen.local)
login: P started but no adapter is presen
Data ONTAP (ddstndsr021.ppen.local)
log.config:error]: in: t!
Password:
Login incorrect

Data ONTAP (ddstndsr021.ppen.local)
login: Adding config files for console fo
Data ONTAP (ddstndsr021.ppen.local)
login: r 0
Password:
Login incorrect

Data ONTAP (ddstndsr021.ppen.Call home for SYlocal)
login: filter sync'd
Password:
Login incorrect
STEM CONFIGURATION WARNING
Fri Jun 29 15:30:40 GMT [ddstndsr021: mgr.boot.disk_done:info]: NetApp Release 8.0.2 7-Mode boot complete. Last disk update written at Fri Jun 29 15:22:24 GMT 2012
Fri Jun 29 15:30:40 GMT [ddstndsr021: cf.hwassist.noMgmtIntfFound:info]: Cannot find an Ethernet interface with the name e0M.
Fri Jun 29 15:30:40 GMT [ddstndsr021: mgr.boot.reason_ok:notice]: System rebooted after a giveback.
Fri Jun 29 15:30:40 GMT [ddstndsr021: callhome.reboot.giveback:info]: Call home for REBOOT (after giveback)
Fri Jun 29 15:30:40 GMT [ddstndsr021: wafl.vol.autoSize.fail:info]: Unable to grow volume 'vol0' to recover space: Cannot grow root volume to more than 95% of the available aggregate size which is currently 225008048k. An attempt was made to set the root volume size to 237298240k.
Fri Jun 29 15:30:41 GMT [ddstndsr021: net.ifconfig.createFail:warning]: ifconfig: can't create /.ha/macaddrs-temp: No space left on device
Fri Jun 29 15:30:42 GMT [ddstndsr021: ip.drd.vfiler.info:info]: Although vFiler units are licensed, the routing daemon runs in the default IP space only.
Ipspace "acp-ipspace" created
too many login attempts..... sleeping for a bit!

Data ONTAP failed to initialize swap space in /mroot/etc/swapfile due to error code: 1
Please pick another volume/aggregate as root or contact technical support.
The amount of space available in the root volume is as follows:
Filesystem                    1K-blocks      Used Avail Capacity  Mounted on
localhost:0x80000000,0x69944c 180795672 180795
672     0   100%
Data ONTAP (ddstndsr021.ppen.local)
login: Fri Jun 29 15:30:06 GMT [localhost
Data ONTAP (ddstndsr021.ppen.local)
login: : cf.nm.nicReset:warning]: Initiat
Data ONTAP (ddstndsr021.ppen.local)
login: ing soft r    /mroot
At leset on Cluster Intercon
Data ONTAP (ddstndsr021.ppen.local)
login: nect card 0 due to rendezvous rese
Data ONTAP (ddstndsr021.ppen.local)
login: t
Password:
Login incorrect

Data ONTAP (ddstndsr021.ppen.loceast 1GB of freeal)
login: Fri Jun 29 15:30:07 UTC 2012
Password:
Login incorrect

Data ONTAP (ddstndsr021.ppen.local)
login:  syslogd: Cannot write to file /et
Data ONTAP (ddstndsr021.ppen.local)
login: c/messages: No space left on dev space is requiric
Data ONTAP (ddstndsr021.ppen.local)
login: e
Password:
Login incorrect
ed to initialize the swap space.
Fri Jun 29 15:30:45 GMT [ddstndsr021: kern.time.commit.


error:error]: Unable to commit updated Timekeeping options to the registry.
Fri Jun 29 16:30:45 BST [ddstndsr021: kern.time.commit.error:error]: Unable to commit updated Timekeeping options to the registry.
Fri Jun 29 16:30:45 BST [ddstndsr021: console_login_mgr:warning]: too many bad logins on console
too many login attempts..... sleeping for a bit!
Fri Jun 29 16:30:46 BST [ddstndsr021: pvif.upLinkTimer:warning]: ifgrp: e0b: Link timed out on coming up
Fri Jun 29 16:30:48 BST [ddstndsr021: console_login_mgr:warning]: too many bad logins on console
too many login attempts..... sleeping for a bit!
Fri Jun 29 16:30:51 BST [ddstndsr021: nbt.nbns.socketError:error]: NBT: Cannot send broadcast message on NBNS socket. Error 0x32: Network is down.
Fri Jun 29 16:30:51 BST [ddstndsr021: console_login_mgr:warning]: too many bad logins on console
cp: /mroot/etc/tmpvarfs.tgz: No space left on device
Failed to backup /var to mroot
.
Fri Jun 29 16:30:58 BST [ddstndsr021: cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Fri Jun 29 16:30:58 BST [ddstndsr021: httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Fri Jun 29 16:31:00 BST [ddstndsr021: monitor.globalStatus.nonCritical:warning]: /vol/vol0 is full (using or reserving 100% of space and 0% of inodes, using 100% of reserve).
Fri Jun 29 16:31:04 BST [ddstndsr021: nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.
Uptime: 3m12s
Fri Jun 29 16:31:07 BST [ddstndsr021: cifs.terminationNotice:warning]: CIFS: shutting down: CIFS local server is shutting down.
Fri Jun 29 16:31:07 BST [ddstndsr021: kern.shutdown:notice]: System shut down because : "D-blade Shutdown".
Fri Jun 29 16:31:07 BST [ddstndsr021: snapmirror.log.writeErr:error]: Failed to write to SnapMirror log file: No space left on device.
Fri Jun 29 16:31:07 BST [ddstndsr021: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of ddstnpp001 disabled (local halt in progress).
Fri Jun 29 16:31:07 BST [ddstndsr021: iscsi.warning:warning]: ISCSI: Unable to update iSCSI tpgroup config file
Fri Jun 29 16:31:07 BST [ddstndsr021: iscsi.service.shutdown:info]: iSCSI service shutdown
Fri Jun 29 16:31:09 BST [ddstndsr021: pvif.allLinksDown:CRITICAL]: ppen_vif: all links down

aborzenkov
19,928 Views

Do you have any user data on this filer? If not, reinitializing (^C for Boot Menu and select option 4) would be the simplest way. This has to be done when filers are not in takeover state. Make sure to have your licenses as you will need to reenter them again.

Otherwise opening case with NetApp support is needed to make sure you will not accidentally lose data.

JOEHIRTH11
19,928 Views

I have no data that I need to keep, factory reset/ reinitialise is a good option with me. I have done cf disable on the working head, then chosen option 4 on the corrupt head, it still says 'waiting for giveback' though?

I can't get to the CLI on the corrupt head in order to do anything

aborzenkov
11,421 Views

Please

1. Re-enable cf for now

2. Disable options cf.takeover.on_panic on good partner

3. Perform cf giveback on good partner

4. Make sure no takeover happened

5. Disable cf again

6. Try option 4 once more

aborzenkov
11,421 Views

Actually, I would disable all cf.takeover.on_* options just to be sure to avoid accidental takeover. Write down current values to restore them afterwards.

dougchioucsi
11,366 Views

 

 

 

 

scottgelb
19,928 Views

Registry errors. Open a case. They can talk through registry restore or if no data to zero out.

Sent from my iPhone 4S

aborzenkov
19,928 Views

Registry errors.

Well ...

Filesystem                1K-blocks  Used Avail   CapacityMounted on
localhost:0x80000000,0x69944c1807956721807956720100%/mroot

root volume is indeed full. It may be possible to recover, but I would not trust myself guiding someone through it via forum.

scottgelb
19,928 Views

agreed..me neither. definitely for support.

Public