OnCommand Performance Manager - "Node HA pair over-utilized " what does it mean?

lines_tim · ‎2015-09-29

I get these messages from OPM pretty frequently, but I don't understand the metric that throws the alarm. There's lots of CPU, NVRAM is flushing nicely, disks aren't busy ... yet something in the node pair is being over-utilized.

Does anyone know what's being measured?

coreywanless · ‎2015-10-02

Hello Tim,

You may already be doing this, but that particular alert is measuring the sum of both nodes in the HA pair relationship. It's alerting you to tell you that if one of the nodes were to fail, you may run into a performance problem until you were able to get that issue resolved. Below is a snip out of the OPM user guide.

Identifies situations where nodes in an HA pair are operating above the bounds of the HA pair operational efficiency. It does this by looking at the CPU and RAM usage for the two nodes in the HA pair. If the combined node utilization of the two nodes exceeds 100%, then a controller failover will impact workload latencies.

Reference: https://library.netapp.com/ecm/ecm_download_file/ECMP12406790

-Corey

netappmagic · ‎2015-10-08

Hello All,

I am trying to modify the threshld value for Node HA pair over-utilized , Can somebody please instruct me how to locate the policy and I wanted to change it to a different value.

Please help me out.

ruijuan · ‎2015-10-08

This is a system defined threashold and cannot be modified by the user.

niels · ‎2015-10-08

RFE: Allow users to override system-defined thresholds.

Especially overprivisioning of node utilization in HA configurations is common - e.g. 120% combined utilization as reduced performanc during HA events is often accepted.

regards, Niels

ruijuan · ‎2015-10-08

The RFE was logged by QA earlier internally and we did receive same feedback from the others which are all captured in one burt. Need to work with PM on this.

netappmagic · ‎2015-10-10

Does than mean we could not modify it for now?