ONTAP Hardware

filer reboot

lasonsysadmn

we have FAS3240 ontap 8.0 clustered filer. One of the filer panic reboot , and follows the SP logs...please help us to identify the issue.

 

 

[IPMI Event.critical]: NMI

Record 2474: Mon Mar 31 21:01:07 2014 [IPMI.notice]: 7f04 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"

Record 2475: Mon Mar 31 21:01:09 2014 [IPMI Event.critical]: L2 watchdog timeout hard reset

Record 2476: Mon Mar 31 21:01:09 2014 [Trap Event.critical]: hwassist l2_watchdog_reset (29)

Record 2477: Mon Mar 31 21:01:09 2014 [Trap Event.critical]: SNMP l2_watchdog_reset (29)

Record 2478: Mon Mar 31 21:01:09 2014 [IPMI Event.critical]: System reset

Record 2479: Mon Mar 31 21:01:09 2014 [IPMI Event.critical]: L2 watchdog action completed

Record 2480: Mon Mar 31 21:01:09 2014 [IPMI.notice]: L2 to L1 is 3(s) 195021(us)

Record 2481: Mon Mar 31 21:01:11 2014 [IPMI.notice]: 8004 | 02 | EVT: 6fc202ff | System_FW_Status | Assertion Event, "NVMEM initialization"

Record 2482: Mon Mar 31 21:01:11 2014 [IPMI.notice]: 8104 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"

Record 2483: Mon Mar 31 21:01:11 2014 [IPMI.notice]: 8204 | 02 | EVT: 0301ffff | System_Fault | Assertion Event, "State Asserted"

Record 2484: Mon Mar 31 21:01:11 2014 [IPMI.notice]: 8304 | 02 | EVT: 0301ffff | Controller_Fault | Assertion Event, "State Asserted"

Record 2485: Mon Mar 31 21:01:11 2014 [SP.notice]: Delaying L2_WDOG ASUP email for 120 seconds

Record 2486: Mon Mar 31 21:01:52 2014 [SP.critical]: Filer Reboots

Record 2487: Mon Mar 31 21:02:11 2014 [IPMI.notice]: 8404 | 02 | EVT: 6fc213ff | System_FW_Status | Assertion Event, "System boot initiated"

Record 2488: Mon Mar 31 21:02:15 2014 [IPMI.notice]: 8504 | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"

Record 2489: Mon Mar 31 21:02:19 2014 [IPMI.notice]: 8604 | 02 | EVT: 6fc22fff | System_FW_Status | Assertion Event, "OnTap Kernel Started"

Record 2490: Mon Mar 31 21:02:19 2014 [IPMI.notice]: 8704 | 02 | EVT: 0300ffff | System_Fault | Assertion Event, "State Deasserted"

Record 2491: Mon Mar 31 21:02:20 2014 [IPMI.notice]: 8804 | 02 | EVT: 0300ffff | Controller_Fault | Assertion Event, "State Deasserted"

Record 2492: Mon Mar 31 21:04:05 2014 [ASUP.notice]: First notification email | (REBOOT (watchdog reset)) CRITICAL | Sent

Record 2493: Mon Mar 31 21:04:30 2014 [SP.normal]: Heartbeat received

3 REPLIES 3

fabrice_berrier

Hi,

edit the /etc/messages of the other node and find the message why the other node panic.

Please find below:

Tue Apr  1 02:23:26 IST [gun-nas-1: wafl.quota.qtree.exceeded:notice]: tid 17: tree quota exceeded on volume vol9. Additional warnings will be suppressed for approximately 60 minutes or until a 'quota resize' is performed.

Tue Apr  1 02:31:08 IST [gun-nas-1: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not responding

Tue Apr  1 02:31:08 IST [gun-nas-1: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 seconds

Tue Apr  1 02:31:09 IST [gun-nas-1: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(gun-nas-2), system_down because l2_watchdog_reset.

Tue Apr  1 02:31:09 IST [gun-nas-1: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(gun-nas-2), system_down because l2_watchdog_reset.

Tue Apr  1 02:31:10 IST [iwarp-vfiler@gun-nas-1: ctrl.rdma.heartBeat:info]: High-availability interconnect status: Missed heartbeat to 192.168.1.240

Tue Apr  1 02:31:10 IST [gun-nas-1: cf.ic.xferTimedOut:error]: wafl interconnect transfer timed out

Tue Apr  1 02:31:10 IST [gun-nas-1: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER

Tue Apr  1 02:31:10 IST [gun-nas-1: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started

Tue Apr  1 02:31:10 IST [gun-nas-1: netif.linkDown:info]: Ethernet c0a: Link down, check cable.

Tue Apr  1 02:31:10 IST [gun-nas-1: netif.linkDown:info]: Ethernet c0b: Link down, check cable.

Tue Apr  1 02:31:10 IST [iwarp-vfiler@gun-nas-1: ctrl.rdma.heartBeat:info]: High-availability interconnect status: Missed heartbeat to 192.168.2.62

Tue Apr  1 02:31:10 IST [gun-nas-1: scsitarget.vtic.down:notice]: The VTIC is down.

Tue Apr  1 02:31:11 IST [gun-nas-2/gun-nas-1: coredump.host.spare.none:info]: No sparecore disk was found for host 1.

Tue Apr  1 02:31:12 IST [gun-nas-1: raid.vol.replay.nvram:info]: Performing raid replay on volume(s)

Tue Apr  1 02:31:12 IST [gun-nas-1: raid.replay.partner.nvram:notice]: Replaying partner NVRAM.

Tue Apr  1 02:31:12 IST [gun-nas-1: raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.

Tue Apr  1 02:31:12 IST [gun-nas-1: raid.stripe.replay.summary:info]: Replayed 0 stripes.

Tue Apr  1 02:31:16 IST [gun-nas-1: wafl.replay.done:info]: WAFL log replay completed, 3 seconds

Tue Apr  1 02:31:18 IST [gun-nas-2/gun-nas-1: fcp.service.startup:info]: FCP service startup

Tue Apr  1 02:31:18 IST [gun-nas-2/gun-nas-1: httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.

Tue Apr  1 02:31:18 IST [gun-nas-2/gun-nas-1: iscsi.service.startup:info]: iSCSI service startup

Tue Apr  1 02:31:18 IST [gun-nas-2/gun-nas-1: net.ifconfig.noPartner:error]: ifconfig: 'c0a' cannot be configured: Address does not match any partner interface.

Tue Apr  1 02:31:18 IST [gun-nas-2/gun-nas-1: net.ifconfig.noPartner:error]: ifconfig: 'c0b' cannot be configured: Address does not match any partner interface.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noPartner:error]: ifconfig: 'e0P' cannot be configured: Address does not match any partner interface.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0a.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0b.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e1a.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e1b.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e4a.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e4b.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e4e4d.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: ip.drd.vfiler.info:info]: Although vFiler units are licensed, the routing daemon runs in the default IP space only.

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: cf_takeover:info]: relog syslog Tue Apr  1 02:30:14 IST [gun-nas-2: api_mpool_03:debug]: root@10.20.32.39:API:https in:<?xml version='1.0' encoding='u

Tue Apr  1 02:31:19 IST [gun-nas-2/gun-nas-1: cf_takeover:info]: relog syslog Tue Apr  1 02:30:14 IST [gun-nas-2: api_mpool_06:debug]: root@10.20.32.39:API:https in:<?xml version='1.0' encoding='u

Tue Apr  1 02:31:20 IST [gun-nas-2/gun-nas-1: cf_takeover:ALERT]: Warning: license setting for snapmanager_sharepoint is not the same on both systems

Tue Apr  1 02:31:20 IST [gun-nas-1: cf.rsrc.takeoverOpFail:error]: Failover monitor: takeover during license_check failed; takeover continuing...

Tue Apr  1 02:31:20 IST [gun-nas-1: net.ifconfig.takeoverError:warning]: WARNING: 10 errors detected during network takeover processing WARNING: Some network clients may not be able to access the cluster during takeover

Tue Apr  1 02:31:20 IST [gun-nas-1: cf.rsrc.takeoverOpFail:error]: Failover monitor: takeover during ifconfig_2 failed; takeover continuing...

Tue Apr  1 02:31:20 IST [gun-nas-2/gun-nas-1: cifs.startup.partner.succeeded:info]: CIFS: CIFS partner server is running.

Tue Apr  1 02:31:20 IST [gun-nas-2/gun-nas-1: proto_init03:info]: Vfiler discovery complete

Tue Apr  1 02:31:20 IST [gun-nas-1 (takeover): cf.rsrc.transitTime:notice]: Top Takeover transit times wafl_replay=3148 {replay_log=3118, mark_replaying=29, enable_log=1, init=0, catalog_init=0, replay_log_missing=0, nvfail=0, partner_log=0, destroy_vvol=0}, wafl_sync=2145, rc=1111 {ifconfig=155, ifconfig=133, always_do_just_after_etc_rc=127, ifconfig=93, ifconfig=75, hostname=59, ifconfig=58, ifconfig=51, ifconfig=50, ifconfig=49}, wafl=550 {paggrs_to_done=228, prvol_to_done=194, pvvols_to_done=120, part_

Tue Apr  1 02:31:20 IST [gun-nas-1 (takeover): callhome.sfo.takeover:CRITICAL]: Call home for CONTROLLER TAKEOVER COMPLETE AUTOMATIC

Tue Apr  1 02:31:20 IST [gun-nas-1 (takeover): callhome.reboot.takeover:error]: Call home for PARTNER REBOOT (CONTROLLER TAKEOVER)

Tue Apr  1 02:31:20 IST [gun-nas-1 (takeover): cf.fm.takeoverComplete:notice]: Failover monitor: takeover completed

Tue Apr  1 02:31:20 IST [gun-nas-1 (takeover): cf.fm.takeoverDuration:info]: Failover monitor: takeover duration time is 10 seconds

Tue Apr  1 02:31:22 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.sfo.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:31:28 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.reboot.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:31:43 IST [gun-nas-2/gun-nas-1: nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the partner server.

Tue Apr  1 02:31:51 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.sfo.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:32:01 IST [gun-nas-1 (takeover): monitor.globalStatus.critical:CRITICAL]: This node has taken over gun-nas-2.

Tue Apr  1 02:32:10 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.reboot.takeover) might be

might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:32:14 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.sfo.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:32:22 IST [gun-nas-1 (takeover): callhome.performance.snap:info]: Call home for PERFORMANCE SNAPSHOT

Tue Apr  1 02:32:32 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.reboot.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:32:50 IST [gun-nas-1 (takeover): asup.general.collect.error:notice]: AutoSupport (callhome.sfo.takeover) might be missing content due to truncation in the (dblade) AutoSupport collection module.

Tue Apr  1 02:33:20 IST [gun-nas-2/gun-nas-1: asup.general.file.missing:error]: Unable to find file: /etc/log/cm_asup_stats

Tue Apr  1 02:33:26 IST [gun-nas-1 (takeover): netif.linkUp:info]: Ethernet c0a: Link up.

Tue Apr  1 02:33:32 IST [gun-nas-1 (takeover): netif.linkUp:info]: Ethernet c0b: Link up.

Tue Apr  1 02:33:36 IST [iwarp-vfiler@gun-nas-1 (takeover): ctrl.rdma.heartBeat:info]: High-availability interconnect status: Starting heartbeat to 192.168.2.237

Tue Apr  1 02:33:36 IST [iwarp-vfiler@gun-nas-1 (takeover): ctrl.rdma.heartBeat:info]: High-availability interconnect status: Starting heartbeat to 192.168.1.116

Tue Apr  1 02:33:42 IST [gun-nas-1 (takeover): cf.fsm.releasingReservations:info]: Failover monitor: Releasing disk reservations in preparation for giveback

Tue Apr  1 02:33:42 IST [gun-nas-1 (takeover): cf.fm.diskRelease:info]: Failover monitor: released disk reservations.

Tue Apr  1 02:33:46 IST [gun-nas-1 (takeover): scsitarget.vtic.up:notice]: The VTIC is up.

Tue Apr  1 03:00:00 IST [gun-nas-1 (takeover): kern.uptime.filer:info]:   3:00am up 64 days,  8:29 18732153008 NFS ops, 5277830570 CIFS ops, 48 HTTP ops, 10499862474 FCP ops, 418578985 iSCSI ops

Tue Apr  1 03:00:12 IST [gun-nas-1 (takeover): cf.partner.ready.giveback:info]: Partner is booted and ready for giveback.

Tue Apr  1 03:31:37 IST [gun-nas-1 (takeover): rlmauth_login_mgr:info]: root logged in from SP

Tue Apr  1 03:36:38 IST [gun-nas-1 (takeover): cf.misc.operatorGiveback:info]: Failover monitor: giveback initiated by operator

Tue Apr  1 03:36:38 IST [gun-nas-1: cf.fm.givebackStarted:notice]: Failover monitor: giveback started

Tue Apr  1 03:36:40 IST [gun-nas-2/gun-nas-1: iscsi.service.shutdown:info]: iSCSI service shutdown

Tue Apr  1 03:36:40 IST [gun-nas-2/gun-nas-1: fcp.service.shutdown:info]: FCP service shutdown

Tue Apr  1 03:36:45 IST [gun-nas-1: cf.rsrc.transitTime:notice]: Top Giveback transit times wafl=4958 {drain_msgs=2008, sync_clean=1090, finish=1066, giveback_sync=437, forget=353, vol_refs=3, abort_scans=1, mark_abort=0, wait_offline=0, wait_create=0}, snapmirror=635, wafl_gb_sync=553, ndmpd=366, nfsd=303, raid=232, registry_giveback=204, sanown_replay=164, vdisk=93, exports=34

Tue Apr  1 03:36:45 IST [gun-nas-1: callhome.sfo.giveback:info]: Call home for CONTROLLER GIVEBACK COMPLETE

Tue Apr  1 03:36:46 IST [gun-nas-1: cf.fm.givebackComplete:notice]: Failover monitor: giveback completed

Tue Apr  1 03:36:46 IST [gun-nas-1: cf.fm.givebackDuration:notice]: Failover monitor: giveback duration time is 8 seconds

on the /etc/messages of both node find  the word "panic"

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public