Subscribe

OCUM Version 6.4RC1 - Duplicate alerts (going from 95% to 94%, etc)

[ Edited ]

I am working to get alerting setup for OCUM 6.4RC1 and have noticed an annoying issue. 

 

If I have a volume (we'll call it testvol) and it is at 89% and goes to 90%+, it alerts as you would expect. As well as if it goes from 94% to 95%+. The issue comes when going the other way. We have some very large volumes (20-40TB) that sit between 90-95% full all the time due to file rotation. We do not want to waste a ton of space by move them below 90%. The issue is that if the volume is at say 96%, and we move it down to 94%, we get another alert. This causes additional incident tickets to be generated and we waste a bunch of time closing useless tickets. Is there a way to get OCUM 6.x to act like previous versions of DFM in that it will only re-alert once it goes below a threshhold? Basically, if it is at 96% and we lower it to 94% I don't want another alert. I only want the alert if we add space to drop it to say 88%, and then it goes over 90%. 

 

 

UPDATE / CLARIFICATION:

 

Hopefully this helps everyone to understand. 

 

1) VolumeA goes from 89 to 90%
          Triggers the 90% threshold and creates alert
2) VolumeA goes from 94 to 95%
          Triggers the 90% obsolete alert
          Triggers the 95% threshold and creates alert
3) We add space (or a snapshot falls off) and the volume shrinks a little
4) VolumeA goes from 95 to 94%.
          Triggers 95% threshold obsolete alert
          Triggers the 90% threshold and create alert


My issue is that in #4, it should not trigger another 90% alert. It should only do that if it drops BELOW 90% and then moves back over 90%.

 

 

UPDATE #2 - This is what we will see when a volume goes over 90, then over 95, and then back down to over 90 but <95. 

 

Risk - Volume Space Nearly Full
Impact Area - Capacity
Severity - Warning
State - New
Source - nasname:/volumename
Trigger Condition - The nearly full threshold set at 90% is breached. 4.35 TB (90.62%) of 4.80 TB is used.

 

Risk - Volume Space Nearly Full
Impact Area - Capacity
Severity - Warning
State - Obsolete
Source - nasname:/volumename
Trigger Condition - The nearly full threshold set at 90% is breached. 4.51 TB (93.85%) of 4.80 TB is used.

 

Risk - Volume Space Full
Impact Area - Capacity
Severity - Error
State - New
Source - nasname:/volumename
Trigger Condition - The full threshold set at 95% is breached. 4.57 TB (95.11%) of 4.80 TB is used.

 

Risk - Volume Space Full
Impact Area - Capacity
Severity - Error
State - Obsolete
Source - nasname:/volumename
Trigger Condition - The full threshold set at 95% is breached. 4.57 TB (95.11%) of 4.80 TB is used.

 

Risk - Volume Space Nearly Full
Impact Area - Capacity
Severity - Warning
State - New
Source - nasname:/volumename
Trigger Condition - The nearly full threshold set at 90% is breached. 4.60 TB (92%) of 5.00 TB is used.

Re: OCUM Version 6.4RC1 - Duplicate alerts (going from 95% to 94%, etc)

Hi,

 

Looks like you had configured two events i.e "Volume Space Nearly Full (once the volume reaches 90%)" and "Volume Space Ful (once the volume reaches 95%)".

 

Once the volume reaches 89% to 90% - UM generates volume space  nearly full warning event

Once the volume reaches 94% to 95% - UM generates volume space  full error event and the old warning event will go to obsolete

Once the volume back to 94% - UM obsolete the error event and creates the warning event

 

If you dont want the second warning event, do not add the "Volume space Nearly Full" into the UM alert.

 

Thanks,

KJag

Re: OCUM Version 6.4RC1 - Duplicate alerts (going from 95% to 94%, etc)

In all previous version of DFM, there was only an event triggered when it passed the threshold. 

 

I.E:

 

Going from 89 -> 90% = triggers warning

Going from 94 -> 95% = triggers warning

 

But not when going from 95 -> 94%. 

 

Is there any way to make OCUM realize that has not go "below" the threshold yet. That way, it would take going below the threshold to reset the alert? Otherwise, we lose the "Almost full" alerting cability and only a the "full alert" which wastes a LOT of space or cuases space issues. A 20TB volumes at 96% is not as critical as a 10GB volume at 91%. So a 20TB volume going between 94-95% is going to constantly create 2 Alerts (i.e. 2 Incident tickets in our system, and 2 pages to NAS Admins) which is a waste of time and resources. We should only get paged once, when it goes from 94->95, not when it goes from 95->94 and then again when it goes from 94->95. 

Re: OCUM Version 6.4RC1 - Duplicate alerts (going from 95% to 94%, etc)

Now I understand the issue more clearly. Currently there is no way to stop raising an event when the capacity goes from 95% to less than 95%. The only optin i could think of is to create an alert script and associate

it with 'Volume nearly full' alert and ensure the script resolve the 'Volume nearly full' event. Note, still the event will be raised, but it will move to resolved as soon as raised.

 

Also i tried the similar scenario in 5.x DFM and i still see there is an warning event is generated when space is moved from 95% to 94%.

 

-KJag

Re: OCUM Version 6.4RC1 - Duplicate alerts (going from 95% to 94%, etc)

I have to disagree with that statement. We have 2 instances of DFM 5.2.1.25273 (5.2.1P2) running. One in our clustered OnTap environement and one in our 7Mode environment. Both of them had multiple instances of going from 95%+ to between 90-95% and none of them created alerts when going down. But on my 6.x test machine, is created alerts every time it went from 95%+ to between 90-94%.