Active IQ Unified Manager Discussions

Protmgr primary backup failing

sswain123
3,233 Views

Hi All,

Was hoping i could ask a question about Protection Manager managed Primary backups.

I’m using PM to protect all my datasets... as part of this, I’ve stopped all snapshot schedules on the filer and am no relying on my protection policy to manage local snaps.  What I’ve been seeing is that a large number of my datasets are failing to create the local snapshot on my defined schedule.  I'm seeing an error message stating: " datasetxxx: timed out while waiting for a protection job "Back up data from node Primary data to node Backup of dataset datasetxxx with daily retention to finish".

I don’t know what’s causing the timeout to occur, I just know that without local snaps I’m limiting my restore points for my users.

Anyone have any thoughts as to what would cause timeout and or what I could do to reduce the timeouts occurring?

Thanks for your help!

Scott

1 ACCEPTED SOLUTION

smoot
3,233 Views

You're hitting a lock used by Protection Manager to keep our data structures intact.

What is happening is a backup job from your primary node to the backup node is still running when the schedule calls for a new primary snapshot. The backup job has locked the primary node storage, so the snapshot job cannot run. By default the local backup job will wait one hour before giving up.

You have a few choices:

1.  Change your local backup job schedule to not run while the primary to secondary backups are running.

2.  Somehow make your backup jobs run faster (perhaps by splitting some datasets into multiple datasets?)

3.  Change the dataset node lock timeout. The option name is "dpScheduledJobExpiration".

4.  Wait for our next release, where we've relaxed the locking. Backup jobs will no longer lock out local backup jobs.

-- Pete

View solution in original post

3 REPLIES 3

smoot
3,234 Views

You're hitting a lock used by Protection Manager to keep our data structures intact.

What is happening is a backup job from your primary node to the backup node is still running when the schedule calls for a new primary snapshot. The backup job has locked the primary node storage, so the snapshot job cannot run. By default the local backup job will wait one hour before giving up.

You have a few choices:

1.  Change your local backup job schedule to not run while the primary to secondary backups are running.

2.  Somehow make your backup jobs run faster (perhaps by splitting some datasets into multiple datasets?)

3.  Change the dataset node lock timeout. The option name is "dpScheduledJobExpiration".

4.  Wait for our next release, where we've relaxed the locking. Backup jobs will no longer lock out local backup jobs.

-- Pete

sswain123
3,233 Views

Thanks for the info Pete.  I was speaking with Adai... he suggested going with D21 where this timeout has been relaxed.

Any chance you have a master list somewhere within NetApp of all the hidden DFM options such as "dpScheduledJobExpiration"?  Would love to get my hands on that... 🙂

Thanks again!

Scott

jerome_barrelet
3,233 Views

Here is the KB how to change the option dpScheduledJobExpiration: 

https://kb.netapp.com/support/index?page=content&id=3011240

Jerome

Public