Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi folks,
I'm currently trying to set up a customer demo that involves PM-managed backups.
Caveat: The source volume contains 996 Qtrees. The Dataset never reaches conformance as PM fails to create more than 253 relationships as the source volumes runs out of snapshots.
Any idea how to protect volumes with >250 Qtrees?
Also all Qtrees from a single volume should be SV'ed to the same secondary volume, hence I have to set the following option accordingly:
pmMaxSvRelsPerSecondaryVol 1000
I know it's far beyond the default 50, but should I expect any negative impact?
regards, Niels
Solved! See The Solution
Hi Niels,
Yes you can do it. For each qtree on the primary snapvault creates a base snapshot, on first update all off them are coalesced into one base snapshot. Below is a example.
fas-sim-1> qtree status OneThousand
Volume Tree Style Oplocks Status
-------- -------- ----- -------- ---------
OneThousand unix enabled normal
OneThousand one unix enabled normal
OneThousand three unix enabled normal
OneThousand two unix enabled normal
fas-sim-1>
After SnapVault Start/Create Relationship job
fas-sim-1> snap list OneThousand
Volume OneThousand
working...
%/used %/total date name
---------- ---------- ------------ --------
21% (21%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_one-src.0 (snapvault)
36% (23%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_OneThousand_fas-sim-1_OneThousand-src.0 (snapvault)
46% (23%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_two-src.0 (snapvault)
53% (21%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_three-src.0 (snapvault)
fas-sim-1>
After SnapVault Update/Protect Now
fas-sim-1> snap list OneThousand
Volume OneThousand
working...
%/used %/total date name
---------- ---------- ------------ --------
27% (27%) 0% ( 0%) Apr 15 11:39 dfpm_base(OneThousand.436)conn1.0 (snapvault,acs)<<<<<<<<<<<<<<<<<<<<<<<<<SV Base Snapshot with dataset name & id
39% (21%) 0% ( 0%) Apr 15 11:38 2012-04-16 12:40:54 daily_fas-sim-1_OneThousand.-.one.three.two<<<<<<<<<<<<<<<<Backup snapshot created by Protect Now.
fas-sim-1>
As the max snapshot per volume is 255, after creating 255 qtree snapvault relationships the dataset will become non-conformant with error saying no snapshot available.
Now run a Protect Now from Protection Manager, all this 255 will be coalesce into one. But still the dataset will show the conformance status as non-conformant.Click on the same and say conform now.
PM will now create relation for next 253 qtree ( as one is already used by dfpm_base and other by the backup snapshot of PM).Once this is done, again it will fail due to non availability of snapshot.
Run Protect now. Keep doing the same until all 1000 qtrees are snapvaulted.
The down side is that, max concurrent SV stream per controller is limited and various with the following.
ONTAP Version
FAS Model
NearStore License being enabled or not.
The regular scheduled updates of this volume, will consume all SV threads until its finished and can increase the back window and delay snapshot creation on the secondary as alll 1000 needs to be snapvaulted before a SV snapshot can be created on the destination. This is the only downside I can think of.
This limit for 50 was done mainly for QSM as each qtree in a QSM needs a base snapshot and only remaining 205 would be available for long term retention as max snapshots per volume is only 255.
Also do remember the options you are changing is a global option and applies to all dataset creating SV relationship.
Regards
adai
Regards
adai
I'm successfully doing this with about 200 qtrees. I don't know why PM is trying to create so many snaps on the primary. What you may need to do is build the dataset gradually by adding a smaller number of qtrees at a time, waiting for conformance to complete before adding more.
Cheers,
Richard
Hi Richard,
thanks for that suggestion.
Although adding qtrees one at a time might be helpful, it's not making it very convenient.
The reason about using PM for relationship management is *not* to care about individual resources.
The idea is to add whole volumes, or even the containing aggregate to the dataset and let PM do it's magic.
Otherwise I'd have to regularly check if new Qtrees have been created that are not getting protected and I'd have to add them manually to the dataset. That's not how I expect PM to work.
regards, Niels
I agree with what you said. I am not sure if you can just add a volume into PM and have it watch for new qtrees though - having said that it's not something I've tested and it may work. Adai - can you comment on this?
Hi
In fact you can even add an entire filer, its called indirect referencing.Though at the end of the day the relationship are created at the qtree or volume level depending upon the replication technology. When an entire filer is added to the primary of a dataset, PM knows what are all the volume and its containing qtrees in the filer. Once you commit your dataset PM kick off creating relationship for each of them as per the technology( VSM/QSM/SV). PM takes its data from the dfm db, which discovers for new volumes or qtrees once every 15minutes by default. When conformance run on the dataset once every 1 hour by default checks for primary members like its qtree and volumes( irrespective of what is the direct member of the dataset, like volume/aggr/filer) and check its secondary to see if there is a corresponding relationship if not it kicks off create relationship jobs. This is the one of the sole job of conformance engine.
Regards
adai
Excellent - thank you for clearing that up. As a side note, do you have an ETA when DFM 5.0.1 will be released?
Just around the corner.Definitely before end of this month. But I cant spell out a definitive date.
Regards
adai
If the Dataset contains a volume, PM will pick up any newly created qtree automatically during the next conformance run and create the destination qtree and the relationship automagically. That's the whole beauty about PM.
If you add a whole aggregate or even a controller, this happens to all underlaying volumes. No need to hassle with individual qtrees/volumes.
regards, Niels
Right now I have each qtree listed in the dataset. If I remove all these and specify just the source volume instead, what exactly will happen??
Hi Niels,
Yes you can do it. For each qtree on the primary snapvault creates a base snapshot, on first update all off them are coalesced into one base snapshot. Below is a example.
fas-sim-1> qtree status OneThousand
Volume Tree Style Oplocks Status
-------- -------- ----- -------- ---------
OneThousand unix enabled normal
OneThousand one unix enabled normal
OneThousand three unix enabled normal
OneThousand two unix enabled normal
fas-sim-1>
After SnapVault Start/Create Relationship job
fas-sim-1> snap list OneThousand
Volume OneThousand
working...
%/used %/total date name
---------- ---------- ------------ --------
21% (21%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_one-src.0 (snapvault)
36% (23%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_OneThousand_fas-sim-1_OneThousand-src.0 (snapvault)
46% (23%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_two-src.0 (snapvault)
53% (21%) 0% ( 0%) Apr 15 11:31 fas-sim-2(0099931872)_OneThousand_backup_three-src.0 (snapvault)
fas-sim-1>
After SnapVault Update/Protect Now
fas-sim-1> snap list OneThousand
Volume OneThousand
working...
%/used %/total date name
---------- ---------- ------------ --------
27% (27%) 0% ( 0%) Apr 15 11:39 dfpm_base(OneThousand.436)conn1.0 (snapvault,acs)<<<<<<<<<<<<<<<<<<<<<<<<<SV Base Snapshot with dataset name & id
39% (21%) 0% ( 0%) Apr 15 11:38 2012-04-16 12:40:54 daily_fas-sim-1_OneThousand.-.one.three.two<<<<<<<<<<<<<<<<Backup snapshot created by Protect Now.
fas-sim-1>
As the max snapshot per volume is 255, after creating 255 qtree snapvault relationships the dataset will become non-conformant with error saying no snapshot available.
Now run a Protect Now from Protection Manager, all this 255 will be coalesce into one. But still the dataset will show the conformance status as non-conformant.Click on the same and say conform now.
PM will now create relation for next 253 qtree ( as one is already used by dfpm_base and other by the backup snapshot of PM).Once this is done, again it will fail due to non availability of snapshot.
Run Protect now. Keep doing the same until all 1000 qtrees are snapvaulted.
The down side is that, max concurrent SV stream per controller is limited and various with the following.
ONTAP Version
FAS Model
NearStore License being enabled or not.
The regular scheduled updates of this volume, will consume all SV threads until its finished and can increase the back window and delay snapshot creation on the secondary as alll 1000 needs to be snapvaulted before a SV snapshot can be created on the destination. This is the only downside I can think of.
This limit for 50 was done mainly for QSM as each qtree in a QSM needs a base snapshot and only remaining 205 would be available for long term retention as max snapshots per volume is only 255.
Also do remember the options you are changing is a global option and applies to all dataset creating SV relationship.
Regards
adai
Regards
adai
Thanks Adai. That sounds as if it's at least doable, although I'd expect PM to handle that on it's own.
I'll go ahead testing and rate the answer accordingly once it's finished (which could take a while as I'm using a FAS270 which is capable of running just seven SV relationships at a time...)
regards, Niels
Hi Neils,
I have done it many time with other customer who had more than 255 like 300+ and not 1000.
Regards
adai
Hi Adai,
do you have a complete procedure handy you could forward to me?
I tried several times now, but after the initial "Create Relationship" job terminates due to 255 snapshots being created, I've never gotten PM to pick-up again.
I even created five manual snapshots as a buffer on source and destination that I deleted before taking the On-Demand Backup. Otherwise that step would fail right-away as there is no snapshot available to create the consolidated SV base snapshot. The initial snapshots get in fact deleted on the source (yeah!), but not on the destination (dough!) - thus my 255 snapshot problem remains.
Having PM taking a ~10 minute break every time I try to do anything with this 1000Qtree dataset does not help either, but I'll have to live with it...
regards, Niels
Ignore. I think I just figured that I cannot replicate more than ~250 qtrees into a single volume to protect against running out of Snaopshots on the destination.
Thus I've to change the option "pmMaxSvRelsPerSecondaryVol" to 250 and have PM create at least four volumes to get the job done.
If I'm correct than this option should not even allow me to set it to a value higher than 250 as it would never work.
regards, Niels
Hi Niels,
I tried this. Created a Dataset with Backup policy.
C:\>dfpm dataset list -x largeQtree
Id: 362
Name: largeQtree
Protection Policy: Back up
Application Policy:
Description:
Owner:
Contact:
Volume Qtree Name Prefix:
Snapshot Name Format:
Primary Volume Name Format:
Secondary Volume Name Format:
Secondary Qtree Name Format:
DR Capable: No
Requires Non Disruptive Restore: No
Node details:
Node Name: Primary data
Resource Pools: priRp
Provisioning Policy: thinProvNas
Time Zone:
DR Capable: No
vFiler:
Node Name: Backup
Resource Pools: secRp
Provisioning Policy:
Time Zone:
DR Capable: No
vFiler:
C:\>dfpm dataset list -m largeQtree
Id Node Name Dataset Id Dataset Name Member Type Name
---------- -------------------- ---------- -------------------- -------------------------------------------------- -------------------------------------------------------
363 Primary data 362 largeQtree volume fas-sim-1:/largeQtree
371 Backup 362 largeQtree volume fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree
C:\>dfpm dataset list -R largeQtree
Id Name Protection Policy Provisioning Policy Relationship Id State Status Hours Source Destination
---------- --------------------------- --------------------------- ------------------- --------------- ------------ ------- ----- ---------------------------- ----------------------------
362 largeQtree Back up thinProvNas 375 snapvaulted idle 0.1 fas-sim-1:/largeQtree/two fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/two
362 largeQtree Back up thinProvNas 377 snapvaulted idle 0.1 fas-sim-1:/largeQtree/four fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/four
362 largeQtree Back up thinProvNas 379 snapvaulted idle 0.1 fas-sim-1:/largeQtree/one fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/one
362 largeQtree Back up thinProvNas 381 snapvaulted idle 0.1 fas-sim-1:/largeQtree/three fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/three
362 largeQtree Back up thinProvNas 383 snapvaulted idle 0.1 fas-sim-1:/largeQtree/- fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/largeQtree_fas-sim-1_largeQtree
C:\>
snaplist on source volume after baseline.
fas-sim-1> snap list largeQtree
Volume largeQtree
working...
%/used %/total date name
---------- ---------- ------------ --------
20% (20%) 0% ( 0%) Apr 19 05:45 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_largeQtree_fas-sim-1_largeQtree-src.0 (snapvault)
35% (22%) 0% ( 0%) Apr 19 05:45 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_three-src.0 (snapvault)
45% (22%) 0% ( 0%) Apr 19 05:44 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_four-src.0 (snapvault)
52% (22%) 0% ( 0%) Apr 19 05:44 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_one-src.0 (snapvault)
58% (20%) 0% ( 0%) Apr 19 05:44 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_two-src.0 (snapvault)
fas-sim-1>
snaplist on the destination volume.
fas-sim-2> snap list largeQtree_backup_fasxsimx1_largeQtree
Volume largeQtree_backup_fasxsimx1_largeQtree
working...
%/used %/total date name
---------- ---------- ------------ --------
19% (19%) 0% ( 0%) Apr 19 05:46 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree-base.2 (busy,snapvault)
fas-sim-2>
Snapvault status.
fas-sim-2> snapvault status
Snapvault is ON.
Source Destination State Lag Status
fas-sim-1:/vol/largeQtree/four fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/four Snapvaulted 00:03:15 Idle
fas-sim-1:/vol/largeQtree/- fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/largeQtree_fas-sim-1_largeQtree Snapvaulted 00:02:23 Idle
fas-sim-1:/vol/largeQtree/one fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/one Snapvaulted 00:03:15 Idle
fas-sim-1:/vol/largeQtree/three fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/three Snapvaulted 00:03:14 Idle
fas-sim-1:/vol/largeQtree/two fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/two Snapvaulted 00:03:16 Idle
fas-sim-2>
Now I did a protect now.
fas-sim-1> snap list largeQtree
Volume largeQtree
working...
%/used %/total date name
---------- ---------- ------------ --------
26% (26%) 0% ( 0%) Apr 19 05:58 dfpm_base(largeQtree.362)conn1.0 (snapvault,acs)
38% (20%) 0% ( 0%) Apr 19 05:57 2012-04-20_0022+0530_daily_largeQtree_fas-sim-1_largeQtree_.-.four.one.three.two
fas-sim-1>
fas-sim-2> snap list largeQtree_backup_fasxsimx1_largeQtree
Volume largeQtree_backup_fasxsimx1_largeQtree
working...
%/used %/total date name
---------- ---------- ------------ --------
18% (18%) 0% ( 0%) Apr 19 05:59 fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree-base.0 (busy,snapvault)
fas-sim-2>
So I am trying to understand what you are doing ? are you using QSM instead of SV only in that case each qtree requires 1 base snapshot.
Regards
adai
I don't understand why having more than 250 qtrees would cause too many snapshots on either side, at least after the initial baseline - could you explain?
Thanks,
Richard
I still have to do some more testing, but in my environment it appears to be as follows:
- PM creates a SnapShot for every qtree on primary and secondary for initialization
- Volumes on both sides runs out of SnapShots once the 255th relationship is initiated
- PM fails to initialize any more relationships, the job is "partially failed" and the dataset non-conformant
Now I perform a "backup now" as suggested by Adai.
- PM fails to create the consolidated SnapVault SnapShot to coalesce all the previously created SnapShots per qtree because there are already 255 SnapShots present - catch 22.
By creating manual SnapShots before the initialization starts and delete them before "backup now", I could at least get PM to coalesce the SV SnapShots on the primary, but it doesn't do so for the secondary, thus it's running out of SnapShots again right away.
I assume PM is not performing the SnapShot coalescing on the secondary because it did not finish all relationships yet.
Testing is taking it's time as every action in NMC with this dataset takes ~10 minutes due to the ~1000 qtrees.
I get these nasty "...did not respond in 60 seconds" screen over and over.
regards, Niels
Hi Niels,
I suspect you are using QSM, as backup policy does QSM as well and only in case of QSM each qtree requires a base snapshot on the primary to support re-sync and fail over. If you could share the snapvault status and snap list for the source and destination volume it would help.
Regards
adai
Hi Adai,
it's definitely SV, not QSM.
Here the details and the output.
For better readability of this post I added most of it as attachments.
1. Created dataset and added volumes with 996 individual Qtrees.
Conformance results can be found in file "conformance_results_after_adding_source_volume.txt"
2. Relationships start to be created.
SnapShots on Secondary seem to be rolling over correctly.
I'm continuously seeing SnapShots like:
ernie(0118042218)_nr_1000qt_dst-base.10 (busy,snapvault)
ernie(0118042218)_nr_1000qt_dst-cleanup.0 (busy,snapvault)
and individual SnapShots for each running initialization:
ernie(0118042218)_nr_1000qt_dst_Marcel-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Clemens-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Ryan-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Malik-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Luan-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Domenic-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Milan-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Ferdinand-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Fritz-src.0 (snapvault)
ernie(0118042218)_nr_1000qt_dst_Lion-src.0 (snapvault)
...
3. SnapShots on primary pile up - see file "snapshots_primary.txt"
4. Job fails because primary volume runs out of SnapShots.
Conformance status: "Nonconformant"
See file "job_details"
5. SnapShots on Secondary are not cleaned up after job failed, in fact it seems that already new SnapShots are created for updating the relationships.
See file "snapshots_secondary.txt"
6. Conformance Results for Dataset after job has failed - see file "conformance_results_after_failed.txt"
7. Preview Conformance - see file "conformance_results_after_preview.txt""
The Conformance run will not be started right now.
8. Instead a 2Protect now" is performed as suggested earlier to coalesce the SnapShots.
This fails right away as PM initially tries to perform a "local backup", which I was not yet able to prohibit, although scheduling and retention for the local backups are disabled or set to "0":
C:\WINDOWS\system32>ssh zzlnxdfm dfpm job details 166746
Job Id: 166746
Job State: completed
Job Description: Still initializing
Job Type: on_demand_backup
Job Status: failure
Bytes Transferred: 0
Dataset Name: NR_Catalog_Demo_1000QTrees
Dataset Id: 64725
Object Name: NR_Catalog_Demo_1000QTrees
Object Id: 64725
Policy Name: _NR Back up
Policy Id: 59632
Started Timestamp: 24 Apr 2012 13:59:48
Abort Requested Timestamp:
Completed Timestamp: 24 Apr 2012 13:59:55
Submitted By: niels
Job progress messages:
Event Id: 3701181
Event Status: normal
Event Type: job-start
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:48
Message:
Error Message:
Event Id: 3701182
Event Status: normal
Event Type: job-progress
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:53
Message: Using naming format set in the dataset NR_Catalog_Demo_1000QTrees to generate the snapshot name.
Error Message:
Event Id: 3701183
Event Status: normal
Event Type: job-progress
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:53
Message: Using naming format %T_%R to create the snapshot name for dataset NR_Catalog_Demo_1000QTrees
Error Message:
Event Id: 3701184
Event Status: error
Event Type: job-progress
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:53
Message:
Error Message: NR_Catalog_Demo_1000QTrees: Could not create snapshot for volume 'vf-nr-96:/nr_1000qt_src' (63722). Reason: No more snapshots available
Event Id: 3701185
Event Status: error
Event Type: snapshot-create
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:53
Message: Failed to create snapshot.
Error Message:
Volume Id: 63722
Volume Name: vf-nr-96:/nr_1000qt_src
Snapshot Name: 2012-04-24_1359+0200_daily
Event Id: 3701186
Event Status: error
Event Type: job-progress
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:55
Message:
Error Message: NR_Catalog_Demo_1000QTrees: Failed to create a local backup.
Event Id: 3701187
Event Status: error
Event Type: job-end
Job Id: 166746
Timestamp: 24 Apr 2012 13:59:55
Message:
Error Message:
9. SnapVault status output from source and destination - see files "snapvault_status_source.txt" and "snapvault_status_destination.txt"
Output truncated to only show relationships relevant to this problem.
I really have no clue how to solve this...
regards, Niels