ONTAP Hardware

FAS 8080 Unexpected Reboots

mseufert
6,912 Views

We are having an issue with one of our FAS 8080's, it's rebooting intermittently now for the past 48 hours. Have tried to power down, re-seat components and boots up fine, runs for a while and then will reboot on it's own again a few minutes later. Looking in the console log shows a number of items. Attached are the logs, any help would be greatly appreciated 

 

Aug 13 13:49:04 [vhapthnas2-01:sas.cable.degraded:warning]: Cable attached to SAS port "7b" is functioning in a degraded mode.

link speed 10Gps
Module Type 10GBase-SR
link speed 10Gps
Module Type 10GBase-SR
link speed 10Gps
Module Type 10GBase-SR
Aug 13 13:49:09 [vhapthnas2-01:sas.cable.degraded:warning]: Cable attached to SAS port "9b" is functioning in a degraded mode.

link speed 10Gps
Module Type 10GE Passive Copper (Legacy, Best Effort)[7 m]
Reservation conflict found on this node's disks!
Local System ID: 536977233
Press Ctrl-C for Maintenance menu to release disks.
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
Disk reservations have been released
Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
add host 127.0.10.1: gateway 127.0.20.1
Aug 13 13:50:16 [vhapthnas2-01:cf.fm.discardNvram:notice]: Failover monitor: node was previously taken over, nvram may be discarded

Aug 13 13:50:17 [vhapthnas2-01:kern.syslog.msg:notice]: The system was down for 13 seconds

option nfs.ifc.rcv.high: Value must be between 154002 and 8388608.
Aug 13 13:50:19 [vhapthnas2-01:snmp.agent.msg.access.denied:warning]: Permission denied for SNMPv3 requests from root. Reason: Password is too short (SNMPv3 requires at least 8 characters).

Aug 13 13:50:19 [vhapthnas2-01:cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of vhapthnas2-01 by vhapthnas2-02 enabled

option stats.archive.enable: unable to start archiver.
Ipspace "acp-ipspace" created
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:20:829181.dmp
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:20:886181.dmp
Creating trace file /etc/log/rastrace/RAID_0_20210813_17:50:21:026182.dmp
Can't dump! module 52 instance 0 handle 0xffffff20ae786e00, cnt1 3 cnt2 0
Aug 13 13:50:21 [vhapthnas2-01:cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of vhapthnas2-02 enabled

Aug 13 13:50:27 [vhapthnas2-01:extCache.rw.io.readLimit:notice]: WAFL external cache warming encountered too many read errors: Too many serious read errors (23).

1 ACCEPTED SOLUTION

pedro_rocha
6,772 Views

what is the SP FW version?

 

did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.

View solution in original post

7 REPLIES 7

pedro_rocha
6,890 Views

 

 

Aug 13 18:54:23 (none) seld[519]: asup sent: (REBOOT (watchdog reset)) CRITICAL, err=0

 

Record 2016: Wed Aug 11 13:18:16.829044 2021 [Trap Event.critical]: hwassist l2_watchdog_reset (29)

 

Could be this: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/System_Rebooted_Due_to_L2_Watchdog_Reset

 

Unfortunately there could be several causes... I will try to find some and post here.

pedro_rocha
6,888 Views

what is the SP FW version?

 

did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.

mseufert
6,775 Views

Hi Pedro, I appreciate you finding that. Yes, I do see it in the System Event log. Please let me know what else you find. Thanks!

pedro_rocha
6,773 Views

what is the SP FW version?

 

did you rebooted the SP? if not.. try that and check if the system stays up for more than those few days reported.

mseufert
6,769 Views

Not sure what SP FW version, but will look at updating it. And yes, we did reboot the SP. I'll keep you posted.

 

Thanks again.

Ontapforrum
6,881 Views

Please raise a ticket with NetApp Technical Support and reference this article for further assistance. The solution might require a part replacement.

 

AFF or FAS 80x0 node is taken over with power good de-asserted SP events:
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/AFF_or_FAS_80x0_node_is_taken_over_with_power_good_de-asserted_SP_ev...

 

Bug ID:836533
https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/836533

 

pedro_rocha
6,882 Views
Per the ontap version I would guess he does not have a support contract
But who knows
Public