We have a large number of unmanaged SnapMirror and Vault relationships that are monitored by OCUM 9.5. Most of these are vaults that have a 12 hour lag, but we have some mirror volumes that replicate every 5 minutes.
Unfortunately the global "Lag Thresholds for Unmanaged Relationships" doesn't fit both types of relationships. e.g.
Warning is set to 150%
Error is set to 250%
Our 5 minute SnapMirror relationship produces an error if the lag exceeds 12m min 30 sec
Our 12 hour Vault relationship would produce an error after 30 hours.
There's one particular mirror relationship that exceeds the error threshold overnight whilst it processes data. We aren't concerned by the lag, but we do find it onerous that it generates an alert every day which in turn goes to our incident ticketing system.
If we increased the error threshold to only alert after 30 mins on the 5 min relationship (600%), it would mean we only receive a notification for failed backups after almost 3 days!
Is there a way to set thresholds per relationship? If not, are there any plans to introduce this in future versions?
I agree that the threshold is pretty generic. Nonetheless it follows a rule.
150% means you missed one update, 200% means you missed two. That's irrespective of the time between updates (as you correctly noted). Unfortunately there is nothing you can do about it in OCUM.
Now - you say the lag is expected as updates just take longer during the night.
So you can't change the thresholds in OCUM on a per relationship bases, but how about accounting for the longer updates in your snapmirror schedule?
I assume you currently have a pretty simple schedule, meaning just every 5min.
How about you building a slightly more sophisticated schedule that accounts for the longer update times at night?
With that OCUM will apply the 150% and 200% Threshold to each intervall individually. So lets say you keep 5min from 8am to 8pm, but change to 20min from 8pm to 8am, then alerts would only be triggered by lagtimes of 30min or more at night, but 7.5min and 10min respectively during the day.
Kind regards, Niels
If this post helped you, help others by marking it as solution or give kudos.
I'm intruigued by your suggestion of creating a more sophisticated schedule, but I'm not sure how it can be done?
From what I can see, a SnapMirror relationship can only have a single cron schedule assocaited with it, so i guess the schedule is where we need to apply the sophistication. I'd love to see some examples please because at the moment the only solution I can see is to have a very long schedule that specifies every time option, e.g. 8:05, 8:10, 8:15, 8:20, 8:25, 8:30, 8:35, 8:40, 8:45, 8:50, 8:55, 9:00, 9:05, etc, etc
that is in fact what I meant. Unfortunately it's not as easy as "8-20@0,5,10,15,20,25,30,35,40,45,50,55 + 20-8@0,20,40". That would be nice though.
So either you create that long cron schedule and specify each time individually, or you might want to try out the "job schedule interval create" documented here:
With that the next update will only start X minutes after the previous one finishes.
I have no idea though how the OCUM thresholds react to that as the actual lag will be quite dynamic.
Using job schedule was a great idea, but unfortunately having just tried to use this in a SnapMirror relationship, it gave an error saying "SnapMirror does not support interval schedules".
Is there perhaps support for this in a later version of ONTAP as we're running 9.1?
Pretty sure that "job schedule interval create" command has been in all ONTAP 9 version, including yours....see p151 in the 9.1 manual here
Sounds like a time to open a support case to check that?
Apologies, I should have said that I can create a job schedule interval, but when I try to attach this to a SnapMirror relationship's schedule using "snapmirror modify -destination-path <path> -schedule <interval_name>, it gives me that error message.
Thanks for your help with this! I'm just setting up mirrors between different versions of sims on my laptop to see if I can replicate the error on 9.1 and 9.3, just in case there's a setting on our prod clusters preventing it from working with intervals.
So far I've had the same error on 9.3 and 9.5, so just downloading 9.6 to test on there too. The output below is from 9.5.
clusc000::> snapmirror create -source-path clusb1h0:testvol -destination-path clusc1h0:testvol_clusb_sm -policy MirrorAllSnapshots -schedule int_5m -type XDP Error: command failed: Schedule "int_5m" is an interval schedule. SnapMirror does not support interval schedules. clusc000::> snapmirror create -source-path clusb1h0:testvol -destination-path clusc1h0:testvol_clusb_sm -policy DPDefault -schedule int_5m -type DP Error: command failed: Schedule "int_5m" is an interval schedule. SnapMirror does not support interval schedules. clusc000::> job schedule interval show int_5m Cluster Name Description ------------- ----------- ----------------------------------------------------- clusc000 int_5m Every 5m
Sorry Paul. I should have been more diligent.
First, I did not create a new relationship, but I modified an existing one as I was too lazy to create a new one.
Also it was not a volume relationship, but an SVM DR relationship.
I tested again and verified your findings:
- create a new volume relationship --> interval schedules not supported
- modify a volume relationship --> interval schedule not supported
- create a new SVM DR relationship --> interval schedule was accepted (but is it indeed supported? Bug?)
- modify SVM DR relationship --> interval schedule was accepted (but is it indeed supported? Bug?)
Therefore it indeed looks as if interval schedules are not working for you.
Also I wuld be sure if OCUM would actually be able to correctly apply the thresholds as the lag between transfers would vary each time.
That leaves my first suggestion to create a more "sophisticated" schedule, which you rightfully said is just a "very long" schedule to define each start of a replication individually.
Sorry if I created confusion.
Kind regards, Niels
Phew, thank you for confirming you also see the same problem because I still had the error on my 9.6 sim! I've since been looking on the support pages to try see what I was doing wrong before raising a support case. No problems for the confusion, I'm glad it wasn't just me!
Hopefully this thread can be used as evidence to request an RFE with OCUM Product Management for SnapMirror Lag Thresholds per Relationship please?
chiming in here too. This feature is not currently on the roadmap for ActiveIQ Unified Manager. I have raised an RFE. (1258908) for consideration by engineering. I have also briefly spoken to the PM. All of that said, this definitely won't make 9.7, which means 9.8/9.9 at the earliest! So you may want to try with those long schedules for those tricky individual relationships. You can play about with the "Advanced" schedule pane in "Schedules" in System Manager, it allows you to be very specific with Month/Day/Week/Hour/Minute, and these can definitely be applied to individual volume relationships.
Let us know if this gives you a temporary workaround!