I am continuously getting OCUM alerts for Perf. Capacity Used value of 212% on nodename1 has triggered a WARNING event based on threshold setting of 100%.
When I check the performance view I see things like 25K IOPs, ~500MB/s throughput and latency <1ms... so, in my opinion, nothing obviously grinding to a halt here.
The OCUM manual tries to explain what "Perf Capacity" is. But what is the community's take on these alerts and how best to address/resolve them in the real world? If this is a case of over-alerting I'd like to know that as well.
Performance Capacity Used would get to levels where a warning is generated only if latency increase is observed at the node processing level. However, the increase in latency is a relative rather than an absolute measure. As a result, if this warning is a real concern for the environment depends on the workloads that are running. There are workloads that may be sensitive to such changes and there are workloads that may be not. Hence, a warning is generated as a precaution and the admin needs to make the final judgement.
There may be cases, when the latency increase is temporary, either due to a temporary load increase or temporary change in the workload demand. If this warning in persisting or occurs periodically then it should be taken into consideration, because it represents a proactive warning that cautions the user of performance issues if further load is added into the node.
Since we upgraded to CDOT I have been getting these errors. At least 5-10 each night. Imagine the weekends. I have logged tickets with support which have lasted weeks. According to all the perfstats they have collected there is no performance issue on the system whatsoever. We have FAS8060 AFF. Eventually a bug was created for me and this has not been fixed to date. I was advised that 7.2 should fix it. I am not able to install that version yet as we aren't on ESX 6+.
I upgraded to 7.3. The alerts have stopped coming through without me having to change anything which is great. I think it is still interesting though that the Perf. Capacity Used is still showing as quite high, sometimes up to 200%. When I logged tickets for this previously, I was advised there was no performance issue from the perfstat which was collected. Quite a confusing counter then really.