Data Infrastructure Management Software Discussions

SnapMirror Lag Thresholds per Relationship

We have a large number of unmanaged SnapMirror and Vault relationships that are monitored by OCUM 9.5. Most of these are vaults that have a 12 hour lag, but we have some mirror volumes that replicate every 5 minutes.

 

Unfortunately the global "Lag Thresholds for Unmanaged Relationships" doesn't fit both types of relationships. e.g.

 

Warning is set to 150%

Error is set to 250%

 

Our 5 minute SnapMirror relationship produces an error if the lag exceeds 12m min 30 sec

Our 12 hour Vault relationship would produce an error after 30 hours.

 

There's one particular mirror relationship that exceeds the error threshold overnight whilst it processes data. We aren't concerned by the lag, but we do find it onerous that it generates an alert every day which in turn goes to our incident ticketing system. 

 

If we increased the error threshold to only alert after 30 mins on the 5 min relationship (600%), it would mean we only receive a notification for failed backups after almost 3 days!

 

Is there a way to set thresholds per relationship? If not, are there any plans to introduce this in future versions?

 

Thanks,

Paul.

15 REPLIES 15

Re: SnapMirror Lag Thresholds per Relationship

Hi Paul,

 

I agree that the threshold is pretty generic. Nonetheless it follows a rule.

150% means you missed one update, 200% means you missed two. That's irrespective of the time between updates (as you correctly noted). Unfortunately there is nothing you can do about it in OCUM.

 

Now - you say the lag is expected as updates just take longer during the night.

So you can't change the thresholds in OCUM on a per relationship bases, but how about accounting for the longer updates in your snapmirror schedule?

I assume you currently have a pretty simple schedule, meaning just every 5min.

How about you building a slightly more sophisticated schedule that accounts for the longer update times at night?

With that OCUM will apply the 150% and 200% Threshold to each intervall individually. So lets say you keep 5min from 8am to 8pm, but change to 20min from 8pm to 8am, then alerts would only be triggered by lagtimes of 30min or more at night, but 7.5min and 10min respectively during the day.

 

Kind regards, Niels

 

--------------------------------

If this post helped you, help others by marking it as solution or give kudos.

Re: SnapMirror Lag Thresholds per Relationship

Hi Niels,

 

I'm intruigued by your suggestion of creating a more sophisticated schedule, but I'm not sure how it can be done? 

 

From what I can see, a SnapMirror relationship can only have a single cron schedule assocaited with it, so i guess the schedule is where we need to apply the sophistication. I'd love to see some examples please because at the moment the only solution I can see is to have a very long schedule that specifies every time option, e.g. 8:05, 8:10, 8:15, 8:20, 8:25, 8:30, 8:35, 8:40, 8:45, 8:50, 8:55, 9:00, 9:05, etc, etc

 

Thanks,

Paul.

Re: SnapMirror Lag Thresholds per Relationship

Hi Paul,

 

that is in fact what I meant. Unfortunately it's not as easy as "8-20@0,5,10,15,20,25,30,35,40,45,50,55 + 20-8@0,20,40". That would be nice though.

So either you create that long cron schedule and specify each time individually, or you might want to try out the "job schedule interval create" documented here:

https://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-cm-cmpr-960/job__schedule__interval__create.html

With that the next update will only start X minutes after the previous one finishes.

I have no idea though how the OCUM thresholds react to that as the actual lag will be quite dynamic.

 

regards, Niels

Re: SnapMirror Lag Thresholds per Relationship

Hi Neils,

 

Using job schedule was a great idea, but unfortunately having just tried to use this in a SnapMirror relationship, it gave an error saying "SnapMirror does not support interval schedules". 

 

Is there perhaps support for this in a later version of ONTAP as we're running 9.1?

Re: SnapMirror Lag Thresholds per Relationship

Hi Paul

Pretty sure that "job schedule interval create" command has been in all ONTAP 9 version, including yours....see p151 in the 9.1 manual here

Sounds like a time to open a support case to check that?
Cheers

 

 

Re: SnapMirror Lag Thresholds per Relationship

Hi Mike,

 

Apologies, I should have said that I can create a job schedule interval, but when I try to attach this to a SnapMirror relationship's schedule using "snapmirror modify -destination-path <path> -schedule <interval_name>, it gives me that error message.

 

 

Highlighted

Re: SnapMirror Lag Thresholds per Relationship

Hmmm.....did you open a case to find out why?   If not: could you?

Re: SnapMirror Lag Thresholds per Relationship

Then it looks to be related to the ONTAP version you are using.

I tried on 9.6 and the snapmirror modify command accepted the interval schedule.

 

regards, Niels

Re: SnapMirror Lag Thresholds per Relationship

Hi Mike/Neils,

 

Thanks for your help with this! I'm just setting up mirrors between different versions of sims on my laptop to see if I can replicate the error on 9.1 and 9.3, just in case there's a setting on our prod clusters preventing it from working with intervals.

 

Cheers,

Paul.

Forums