ONTAP Hardware
ONTAP Hardware
We are having an issue with one of our FAS 8080's, it's rebooting intermittently now for the past 48 hours. Have tried to power down, re-seat components and boots up fine, runs for a while and then will reboot on it's own again a few minutes later. Looking in the console log shows a number of items. Attached are the logs, any help would be greatly appreciated
Aug 13 13:49:04 [vhapthnas2-01:sas.cable.degraded:warning]: Cable attached to SAS port "7b" is functioning in a degraded mode.
link speed 10Gps
Module Type 10GBase-SR
link speed 10Gps
Module Type 10GBase-SR
link speed 10Gps
Module Type 10GBase-SR
Aug 13 13:49:09 [vhapthnas2-01:sas.cable.degraded:warning]: Cable attached to SAS port "9b" is functioning in a degraded mode.
link speed 10Gps
Module Type 10GE Passive Copper (Legacy, Best Effort)[7 m]
Reservation conflict found on this node's disks!
Local System ID: 536977233
Press Ctrl-C for Maintenance menu to release disks.
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
Disk reservations have been released
Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
add host 127.0.10.1: gateway 127.0.20.1
Aug 13 13:50:16 [vhapthnas2-01:cf.fm.discardNvram:notice]: Failover monitor: node was previously taken over, nvram may be discarded
Aug 13 13:50:17 [vhapthnas2-01:kern.syslog.msg:notice]: The system was down for 13 seconds
option nfs.ifc.rcv.high: Value must be between 154002 and 8388608.
Aug 13 13:50:19 [vhapthnas2-01:snmp.agent.msg.access.denied:warning]: Permission denied for SNMPv3 requests from root. Reason: Password is too short (SNMPv3 requires at least 8 characters).
Aug 13 13:50:19 [vhapthnas2-01:cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of vhapthnas2-01 by vhapthnas2-02 enabled
option stats.archive.enable: unable to start archiver.
Ipspace "acp-ipspace" created
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:20:829181.dmp
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:20:886181.dmp
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:21:026182.dmp
Can't dump! module 52 instance 0 handle 0xffffff20ae786e00, cnt1 3 cnt2 0
Aug 13 13:50:21 [vhapthnas2-01:cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of vhapthnas2-02 enabled
Aug 13 13:50:27 [vhapthnas2-01:extCache.rw.io.readLimit:notice]: WAFL external cache warming encountered too many read errors: Too many serious read errors (23).
Solved! See The Solution
what is the SP FW version?
did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.
Aug 13 18:54:23 (none) seld[519]: asup sent: (REBOOT (watchdog reset)) CRITICAL, err=0
Record 2016: Wed Aug 11 13:18:16.829044 2021 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Could be this: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/System_Rebooted_Due_to_L2_Watchdog_Reset
Unfortunately there could be several causes... I will try to find some and post here.
what is the SP FW version?
did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.
Hi Pedro, I appreciate you finding that. Yes, I do see it in the System Event log. Please let me know what else you find. Thanks!
what is the SP FW version?
did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.
Not sure what SP FW version, but will look at updating it. And yes, we did reboot the SP. I'll keep you posted.
Thanks again.
Please raise a ticket with NetApp Technical Support and reference this article for further assistance. The solution might require a part replacement.
AFF or FAS 80x0 node is taken over with power good de-asserted SP events:
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/AFF_or_FAS_80x0_node_is_taken_over_with_power_good_de-asserted_SP_ev...
Bug ID:836533
https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/836533