Subscribe
Accepted Solution

Service Processor Can Trigger Watchdog Reset After 500+ Days of Uptime

Service Processor Can Trigger Watchdog Reset After 500+ Days of Uptime

WARNING

The SP in this system has been running for 1260 days.

 

 

How should I resolve this ?

Re: Service Processor Can Trigger Watchdog Reset After 500+ Days of Uptime

Hi

 

first of all. initiate "SP reboot" on the nodes ASAP., the operation by design is not user disruptive - however. it might i believe that theoretical that itself might trigger the bug.

 

 

After doing that,  you can decide if you want to upgrade the SP or wait another 500 day (preferably run the next "sp reboot" around the 400 days).

 

if you have FAS32XX/FAS62XX/FAS22XX and you running non fixed BIOS version (see below) , that upgrade or reboot can trigger another bug https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=702464

there is a workarround in the bug to fix it if crossed. the fix for this issue is in the fowwoling BIOS versions (taken from the buios version download pages) :

BIOS 7.3 for FAS62XX

BIOS 8.3 for FAS22XX

BIOS 5.3 for FAS32XX

 

Gidi

Re: Service Processor Can Trigger Watchdog Reset After 500+ Days of Uptime

Well, so here i have a case.

Node is got rebooted and services has been takenover to partner node, when i just see the log from SP if its related to automatic reset in every 500 days of uptime Sp?? 

 

 

Log Collection Time: Fri Jul 28 00:31:17 GMT 2017
======================================
=============================
/usr/local/bin/sel rd_last 500
===============================
Record 420: Sun Jan 26 19:12:47 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Sun Jan 26 19:12:49 2014. New time: Sun Jan 26 19:12:47 2014.
Record 421: Mon Jan 27 07:27:45 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Jan 27 07:27:47 2014. New time: Mon Jan 27 07:27:45 2014.
Record 422: Mon Jan 27 19:42:43 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Jan 27 19:42:45 2014. New time: Mon Jan 27 19:42:43 2014.
Record 423: Tue Jan 28 07:55:41 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Tue Jan 28 07:55:43 2014. New time: Tue Jan 28 07:55:41 2014.
Record 424: Tue Jan 28 20:17:39 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Tue Jan 28 20:17:41 2014. New time: Tue Jan 28 20:17:39 2014.
Record 425: Wed Jan 29 08:39:37 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed Jan 29 08:39:39 2014. New time: Wed Jan 29 08:39:37 2014.
Record 426: Wed Jan 29 20:55:35 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed Jan 29 20:55:37 2014. New time: Wed Jan 29 20:55:35 2014.
Record 427: Thu Jan 30 09:10:33 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Thu Jan 30 09:10:35 2014. New time: Thu Jan 30 09:10:33 2014.
Record 428: Thu Jan 30 21:28:31 2014 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Thu Jan 30 21:28:33 2014. New time: Thu Jan 30 21:28:31 2014.
.
.
.
.
.
.
.
.
Record 885: Sun Apr 16 10:43:28 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Sun Apr 16 10:43:26 2017. New time: Sun Apr 16 10:43:28 2017.
Record 886: Mon Apr 24 05:37:44 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Apr 24 05:37:42 2017. New time: Mon Apr 24 05:37:44 2017.
Record 887: Mon May  1 05:22:26 2017 [IPMI.notice]: e200 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization"
Record 888: Mon May  1 05:22:26 2017 [IPMI.notice]: e300 | 02 | EVT: 0301ffff | System_Fault | Assertion Event, "State Asserted"
Record 889: Mon May  1 05:22:26 2017 [IPMI.notice]: e400 | 02 | EVT: 0301ffff | Controller_Fault | Assertion Event, "State Asserted"
Record 890: Mon May  1 05:23:14 2017 [SP.critical]: Filer Reboots
Record 891: Mon May  1 05:23:14 2017 [Trap Event.critical]: hwassist abnormal_reboot (28)
Record 892: Mon May  1 05:23:20 2017 [IPMI.notice]: e500 | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
Record 893: Mon May  1 05:23:29 2017 [IPMI.notice]: e600 | 02 | EVT: 6fc22fff | System_FW_Status | Assertion Event, "OnTap Kernel Running"
Record 894: Mon May  1 05:23:29 2017 [IPMI.notice]: e700 | 02 | EVT: 0300ffff | System_Fault | Assertion Event, "State Deasserted"
Record 895: Mon May  1 05:23:29 2017 [IPMI.notice]: e800 | 02 | EVT: 0300ffff | Controller_Fault | Assertion Event, "State Deasserted"
Record 896: Mon May  1 05:24:25 2017 [ASUP.notice]: First notification email | (REBOOT (abnormal)) WARNING | Sent
Record 897: Mon May  1 05:34:08 2017 [SP.critical]: Heartbeat stopped
Record 898: Mon May  1 05:38:17 2017 [ASUP.notice]: Reminder email | (REBOOT (abnormal)) WARNING | Sent
Record 899: Wed May  3 05:57:32 2017 [SP.normal]: Heartbeat started
Record 900: Wed May  3 05:57:32 2017 [Heartbeat.notice]: Heartbeat start: Set SP time. Old time: Wed May  3 05:57:32 2017. New time: Wed May  3 05:57:20 2017.
Record 901: Wed May  3 05:57:20 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed May  3 05:57:32 2017. New time: Wed May  3 05:57:20 2017.
Record 902: Wed May  3 06:04:32 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed May  3 06:04:20 2017. New time: Wed May  3 06:04:32 2017.
Record 903: Thu May  4 07:35:35 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Thu May  4 07:35:33 2017. New time: Thu May  4 07:35:35 2017.
Record 904: Tue May  9 21:39:47 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Tue May  9 21:39:45 2017. New time: Tue May  9 21:39:47 2017.
Record 905: Tue May 16 10:17:00 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Tue May 16 10:16:58 2017. New time: Tue May 16 10:17:00 2017.
Record 906: Mon May 22 21:25:13 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon May 22 21:25:11 2017. New time: Mon May 22 21:25:13 2017.
Record 907: Mon May 29 08:20:26 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon May 29 08:20:24 2017. New time: Mon May 29 08:20:26 2017.
Record 908: Sun Jun  4 14:54:39 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Sun Jun  4 14:54:37 2017. New time: Sun Jun  4 14:54:39 2017.
Record 909: Sun Jun 11 15:16:53 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Sun Jun 11 15:16:51 2017. New time: Sun Jun 11 15:16:53 2017.
Record 910: Mon Jun 19 02:12:08 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Jun 19 02:12:06 2017. New time: Mon Jun 19 02:12:08 2017.
Record 911: Fri Jul  7 18:09:42 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Fri Jul  7 18:09:44 2017. New time: Fri Jul  7 18:09:42 2017.
Record 912: Fri Jul  7 18:10:43 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Fri Jul  7 18:10:41 2017. New time: Fri Jul  7 18:10:43 2017.
Record 913: Thu Jul 20 18:28:06 2017 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Thu Jul 20 18:28:08 2017. New time: Thu Jul 20 18:28:06 2017.
Record 914: Fri Jul 28 00:29:42 2017 [IPMI.notice]: e900 | 02 | EVT: 6fc200ff | System_FW_Status | Assertion Event, "Unspecified"
Record 915: Fri Jul 28 00:29:46 2017 [IPMI.notice]: ea00 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization"
Record 916: Fri Jul 28 00:29:46 2017 [IPMI.notice]: eb00 | 02 | EVT: 0301ffff | System_Fault | Assertion Event, "State Asserted"
Record 917: Fri Jul 28 00:29:46 2017 [IPMI.notice]: ec00 | 02 | EVT: 0301ffff | Controller_Fault | Assertion Event, "State Asserted"
Record 918: Fri Jul 28 00:31:15 2017 [SP.critical]: Filer Reboots
Record 919: Fri Jul 28 00:31:15 2017 [Trap Event.critical]: hwassist abnormal_reboot (28)