ONTAP Hardware

At what point does the degraded status in "system health subsystem show" get cleared?"

hojun
638 Views

Hello, I am a NetApp partner.
I received an inquiry from a customer, so I would like to ask the community for advice.

Is it possible to know the exact timing of when alerts are generated or cleared in system health subsystem show on NetApp?

As shown below, for Subsystem: Environment, the Subsystem Refresh Interval is displayed as 10m, 10m, 10m:

 

Subsystem: Environment

Health: ok

Initialization State: initialized

Number of Outstanding Alerts: 0

Number of Suppressed Alerts: 0

Node: aff300-rtp-14a, aff300-rtp-14a, aff300-rtp-14b

Subsystem Refresh Interval: 10m, 10m, 10m

 

I believe there must be a difference between showing a single 10m and showing 10m, 10m, 10m.

For example, if a PSU goes into a fault state and then returns to normal, it can take much longer than 10 minutes for the System health subsystem to reflect the recovered status.

 

Does this mean that the system checks the state three times at 10-minute intervals before updating?
To clarify my understanding, I created an algorithm flow diagram.

I would appreciate any advice on this matter.

 

 

algorithm flow.jpg

  

 

1 ACCEPTED SOLUTION

chamfer
605 Views

Hi @hojun ,

 

This is actually simpler than it seems.  The 10m, 10m, 10m that is shown in your CLI output actually aligns with the nodes that contribute to the health subsystem "Environment".

  • There are 3x nodes
  • Each node is set for a subsystem refresh interval of 10m for the subsystem "Environment".
  • Some subsystems such as "IO" will have 2x nodes listed (in your case) and therefor two subsystem refresh intervals
  • The subsystem "Switch-Health" should be owned by one node and therefore have one subsystem refresh interval.

 

In conclusion the Environment subsystem on your environment is refreshed / checked every 10 minutes.

 

I hope that this helps!

View solution in original post

3 REPLIES 3

chamfer
606 Views

Hi @hojun ,

 

This is actually simpler than it seems.  The 10m, 10m, 10m that is shown in your CLI output actually aligns with the nodes that contribute to the health subsystem "Environment".

  • There are 3x nodes
  • Each node is set for a subsystem refresh interval of 10m for the subsystem "Environment".
  • Some subsystems such as "IO" will have 2x nodes listed (in your case) and therefor two subsystem refresh intervals
  • The subsystem "Switch-Health" should be owned by one node and therefore have one subsystem refresh interval.

 

In conclusion the Environment subsystem on your environment is refreshed / checked every 10 minutes.

 

I hope that this helps!

chamfer
590 Views

If a device fails, just before a refresh cycle and then recovers just after the next refresh cycle you will see unhealthy subsystem of just over 20 minutes which would be the maximum. 

hojun
183 Views

I conducted the test based on Chamfer’s advice.
As you mentioned, it worked as expected, and I have shared this result with the customer.
Thank you very much for your help.  😚

Public