Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
I want to use performance advisor to see how many hours a day a filer is idle.
I've set up a threshold event that specifies if cpu_busy is under 10% for an hour
then generate perf:cpu_busy event.
I know the filer has been idle for 8 hours straight, but only one event has been
generated. I thought I'd see an event for each of the 60 minute intervals where
the filer was running below 10% cpu_busy.
How can I get an event generated for each qualified interval ?
I don't think the alarm repeat notification feature addresses this problem.
The way I read the repeat notification feature is it keeps sending a
reminder alarm for a specific event.
-Marlon
Hello Marlon,
Events are meant to persist as long as the condition associated with them remains on the object on which the event is generated. There won't be any more new events ( same event type but with new event ID) which could have been generated at every monitoring cycle which runs at certain interval. If the condition is no longer existing, the event is moved to histroy. This is done by design and infact a good thing. I don't think there is any option to chage this behavior.
Why you need multiple events for the same thing? I think CPU monitoring interval is 5 minutes(default) and not 60, but lets assume it was changed to 60 by the user. For your case you don't need a event getting generated every 60 minutes. The current events of severity "warning or worse" are always shown for the filer. There is a column "Triggered" for the event which tells when was this event was generated. If after 8 hours (and 8 monitoring cycles) the event is still there, it means it was busy for 8 hours and still is.
If you need to know for how long my filer's CPU was busy, it shud be the difference in time between the event(s) " CPU too busy" and "CPU Load Normal" in the history page. The accuracy of this information depends on your CPU monitoring interval. If you make it 60 min, this will be higly inaccurate.
I hope this helps.
warm regards,
Abhishek
HI Abhishek,
Marlons is using Performance Advisor, and you are talking about operation manger cpu monitoring which is by default 5mins.
Regards
adai
AFAIK older duplicate events go into history table. Can you check history events report.
Also set alarm without repeat notification.
For every event that is generated, an alarm is sent before a duplicate forces the older current into history.
You'll know more clearly how many events were generated.
-Prasad
@
older duplicate events go into history table.
---------------------------------------------------------------------------------------------------------------------
What do you mean by duplicate events? Does duplicate event means that if an event is generated, and at another monitoring cycle and the condidtion still exists, another event( duplicate with new event IDs) gets generated which gets moved to the histroy?
If this is what you mean then I don't think is right, but I would need to verify to be absolutely certain. There are no duplicate event like this. If the event condition persist even at the second (and subsequent) monitoring cycle(s), no new events are generated and the first event is still being shown with Tirggerd column showing the time it was generated.
Warning or worse event never go to histroy until monitoring detects that the condition that generated them is no longer existing( or deleted by the user).
Hi Marlon,
You must already be knowing this,cpu_busy is collected in PA by default every 1m.The threshold interval prevent from getting alerts on spike, instead if the value stays there for the interval specified only then
generate an event.
So as per your threshold, alert when cpu_busy falls below 10% and stays there for 1hour.(ie, for 60 sample as per the default collection interval).So what happens is as soon as the value falls below, 10% for the first time,
a counter is started to see if the value stays equal or below in your case 10% for the threshold interval specified.If in between even 1 sample value falls above 10% then this counter is stopped. Again, when the value crosses 10% the counter starts and counting for 60m, if the value stays below 10% then an event is generated. This sets an event status on the object on which threshold is set.In this case its the cpu.
So until the status of this object changes, (i.e. from error to normal when the cpu_busy fall above 10%) a new event will not be generated for cpu_busy every 60m.It will only happen when a normal event is raised, which modifies the event status of the object, and again if it falls below 10% then one more event will be generated else not.
If your need is to see if the filer is idle for 8hours, its better to set the threshold interval to 8h instead of 60m.
Another thing what you can do is the following report to see how long was the threshold was violated.
[root@oncommand ~]# dfm report view storage-system-performance-summary
Object ID Type Status Storage System Model CPU Busy (%) Total Ops/Sec Net Throughput (MB/Sec) Disk Throughput (KB/Sec) Perf Threshold Violation Count Perf Threshold Violation Period (Sec)
--------- ------------------------ -------- --------------------- --------- ------------ ------------- ----------------------- ------------------------ ------------------------------ -------------------------------------
91 Controller Error fas-sim-1.localdomain Simulator 2.47 0.00 0.00 85.73 1 900
90 Controller Error fas-sim-2.localdomain Simulator 1.27 0.00 0.00 66.03
92 Controller Critical fas-sim-3.localdomain Simulator
[root@oncommand ~]#
Also you can set repeat notification, which will send you details, where the event id would be same but the values of cpu_busy condition might be different.
Say at first time when repeat notificatin was sent it was 8% and on the next its 6% this value is reflected in the condition, but the event id and the source of the event remains the same.
Hope this helps.
Regards
adai
Thanks Adai, and everyone else that replied.
The thing I didn't understand was once the 'breached' event happens I won't see
another breached event for that object until the object has gone back to 'normal'.
It works like a toggle, once the object is set to breached it stays in that state, without
generating any more events. The only possible next event for the object is to go
normal. It stays in that state, without generating any more events. The only possible
next event for the object is to go to breached...and so on.
