Active IQ Unified Manager Discussions

How to protect a volume with ~1000 Qtrees with PM?

niels
10,355 Views

Hi folks,

I'm currently trying to set up a customer demo that involves PM-managed backups.

Caveat: The source volume contains 996 Qtrees. The Dataset never reaches conformance as PM fails to create more than 253 relationships as the source volumes runs out of snapshots.

Any idea how to protect volumes with >250 Qtrees?

Also all Qtrees from a single volume should be SV'ed to the same secondary volume, hence I have to set the following option accordingly:

pmMaxSvRelsPerSecondaryVol   1000

I know it's far beyond the default 50, but should I expect any negative impact?

regards, Niels

1 ACCEPTED SOLUTION

adaikkap
10,349 Views

Hi Niels,

     Yes you can do it. For each qtree on the primary snapvault creates a base snapshot, on first update all off them are coalesced into one base snapshot. Below is a example.

fas-sim-1> qtree status OneThousand

Volume   Tree     Style Oplocks  Status

-------- -------- ----- -------- ---------

OneThousand          unix  enabled  normal

OneThousand one      unix  enabled  normal

OneThousand three    unix  enabled  normal

OneThousand two      unix  enabled  normal

fas-sim-1>

After SnapVault Start/Create Relationship job

fas-sim-1> snap list OneThousand

Volume OneThousand

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

21% (21%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_one-src.0 (snapvault)

36% (23%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_OneThousand_fas-sim-1_OneThousand-src.0 (snapvault)

46% (23%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_two-src.0 (snapvault)

53% (21%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_three-src.0 (snapvault)

fas-sim-1>

After SnapVault Update/Protect Now

fas-sim-1> snap list OneThousand

Volume OneThousand

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

27% (27%)    0% ( 0%)  Apr 15 11:39  dfpm_base(OneThousand.436)conn1.0 (snapvault,acs)<<<<<<<<<<<<<<<<<<<<<<<<<SV Base Snapshot with dataset name & id

39% (21%)    0% ( 0%)  Apr 15 11:38  2012-04-16 12:40:54 daily_fas-sim-1_OneThousand.-.one.three.two<<<<<<<<<<<<<<<<Backup snapshot created by Protect Now.

fas-sim-1>

As the max snapshot per volume is 255, after creating 255 qtree snapvault relationships the dataset will become non-conformant with error saying no snapshot available.

Now run a Protect Now from Protection Manager, all this 255 will be coalesce into one. But still the dataset will show the conformance status as non-conformant.Click on the same and say conform now.

PM will now create relation for next 253 qtree ( as one is already used by dfpm_base and other by the backup snapshot of PM).Once this is done, again it will fail due to non availability of snapshot.

Run Protect now. Keep doing the same until all 1000 qtrees are snapvaulted.

The down side is that, max concurrent SV stream per controller is limited and various with the following.

ONTAP Version

FAS Model

NearStore License being enabled or not.


The regular scheduled updates of this volume, will consume all SV threads until its finished and can increase the back window and delay  snapshot creation on the secondary as alll 1000 needs to be snapvaulted before a SV snapshot can be created on the destination. This is the only downside I can think of.

This limit for 50 was done mainly for QSM as each qtree in a QSM needs a base snapshot and only remaining 205 would be available for long term retention as max snapshots per volume is only 255.

Also do remember the options you are changing is a global option and applies to all dataset creating SV relationship.

Regards

adai

Regards

adai

View solution in original post

17 REPLIES 17

rmharwood
10,285 Views

I'm successfully doing this with about 200 qtrees. I don't know why PM is trying to create so many snaps on the primary. What you may need to do is build the dataset gradually by adding a smaller number of qtrees at a time, waiting for conformance to complete before adding more.

Cheers,

Richard

niels
10,285 Views

Hi Richard,

thanks for that suggestion.

Although adding qtrees one at a time might be helpful, it's not making it very convenient.

The reason about using PM for relationship management is *not* to care about individual resources.

The idea is to add whole volumes, or even the containing aggregate to the dataset and let PM do it's magic.

Otherwise I'd have to regularly check if new Qtrees have been created that are not getting protected and I'd have to add them manually to the dataset. That's not how I expect PM to work.

regards, Niels

rmharwood
10,285 Views

I agree with what you said. I am not sure if you can just add a volume into PM and have it watch for new qtrees though - having said that it's not something I've tested and it may work. Adai - can you comment on this?

adaikkap
10,285 Views

Hi

     In fact you can even add an entire filer, its called indirect referencing.Though at the end of the day the relationship are created at the qtree or volume level depending upon the replication technology. When an entire filer is added to the primary of  a dataset, PM knows what are all the volume and its containing qtrees in the filer. Once you commit your dataset PM kick off creating relationship for each of them as per the technology( VSM/QSM/SV). PM takes its data from the dfm db, which discovers for new volumes or qtrees once every 15minutes by default. When conformance run on the dataset once every 1 hour by default checks for primary members like its qtree and volumes( irrespective of what is the  direct member of the dataset, like volume/aggr/filer) and check its secondary to see if there is a corresponding relationship if not it kicks off create relationship jobs. This is the one of the sole job of conformance engine.

Regards

adai

rmharwood
10,285 Views

Excellent - thank you for clearing that up. As a side note, do you have an ETA when DFM 5.0.1 will be released?

adaikkap
9,732 Views

Just around the corner.Definitely before end of this month. But I cant spell out a definitive date.

Regards

adai

niels
10,285 Views

If the Dataset contains a volume, PM will pick up any newly created qtree automatically during the next conformance run and create the destination qtree and the relationship automagically. That's the whole beauty about PM.

If you add a whole aggregate or even a controller, this happens to all underlaying volumes. No need to hassle with individual qtrees/volumes.

regards, Niels

rmharwood
9,732 Views

Right now I have each qtree listed in the dataset. If I remove all these and specify just the source volume instead, what exactly will happen??

adaikkap
10,350 Views

Hi Niels,

     Yes you can do it. For each qtree on the primary snapvault creates a base snapshot, on first update all off them are coalesced into one base snapshot. Below is a example.

fas-sim-1> qtree status OneThousand

Volume   Tree     Style Oplocks  Status

-------- -------- ----- -------- ---------

OneThousand          unix  enabled  normal

OneThousand one      unix  enabled  normal

OneThousand three    unix  enabled  normal

OneThousand two      unix  enabled  normal

fas-sim-1>

After SnapVault Start/Create Relationship job

fas-sim-1> snap list OneThousand

Volume OneThousand

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

21% (21%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_one-src.0 (snapvault)

36% (23%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_OneThousand_fas-sim-1_OneThousand-src.0 (snapvault)

46% (23%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_two-src.0 (snapvault)

53% (21%)    0% ( 0%)  Apr 15 11:31  fas-sim-2(0099931872)_OneThousand_backup_three-src.0 (snapvault)

fas-sim-1>

After SnapVault Update/Protect Now

fas-sim-1> snap list OneThousand

Volume OneThousand

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

27% (27%)    0% ( 0%)  Apr 15 11:39  dfpm_base(OneThousand.436)conn1.0 (snapvault,acs)<<<<<<<<<<<<<<<<<<<<<<<<<SV Base Snapshot with dataset name & id

39% (21%)    0% ( 0%)  Apr 15 11:38  2012-04-16 12:40:54 daily_fas-sim-1_OneThousand.-.one.three.two<<<<<<<<<<<<<<<<Backup snapshot created by Protect Now.

fas-sim-1>

As the max snapshot per volume is 255, after creating 255 qtree snapvault relationships the dataset will become non-conformant with error saying no snapshot available.

Now run a Protect Now from Protection Manager, all this 255 will be coalesce into one. But still the dataset will show the conformance status as non-conformant.Click on the same and say conform now.

PM will now create relation for next 253 qtree ( as one is already used by dfpm_base and other by the backup snapshot of PM).Once this is done, again it will fail due to non availability of snapshot.

Run Protect now. Keep doing the same until all 1000 qtrees are snapvaulted.

The down side is that, max concurrent SV stream per controller is limited and various with the following.

ONTAP Version

FAS Model

NearStore License being enabled or not.


The regular scheduled updates of this volume, will consume all SV threads until its finished and can increase the back window and delay  snapshot creation on the secondary as alll 1000 needs to be snapvaulted before a SV snapshot can be created on the destination. This is the only downside I can think of.

This limit for 50 was done mainly for QSM as each qtree in a QSM needs a base snapshot and only remaining 205 would be available for long term retention as max snapshots per volume is only 255.

Also do remember the options you are changing is a global option and applies to all dataset creating SV relationship.

Regards

adai

Regards

adai

niels
10,284 Views

Thanks Adai. That sounds as if it's at least doable, although I'd expect PM to handle that on it's own.

I'll go ahead testing and rate the answer accordingly once it's finished (which could take a while as I'm using a FAS270 which is capable of running just seven SV relationships at a time...)

regards, Niels

adaikkap
10,284 Views

Hi Neils,

     I have done it many time with other customer who had more than 255 like  300+ and not 1000.

Regards

adai

niels
9,731 Views

Hi Adai,

do you have a complete procedure handy you could forward to me?

I tried several times now, but after the initial "Create Relationship" job terminates due to 255 snapshots being created, I've never gotten PM to pick-up again.

I even created five manual snapshots as a buffer on source and destination that I deleted before taking the On-Demand Backup. Otherwise that step would fail right-away as there is no snapshot available to create the consolidated SV base snapshot. The initial snapshots get in fact deleted on the source (yeah!), but not on the destination (dough!) - thus my 255 snapshot problem remains.

Having PM taking a ~10 minute break every time I try to do anything with this 1000Qtree dataset does not help either, but I'll have to live with it...

regards, Niels

Ignore. I think I just figured that I cannot replicate more than ~250 qtrees into a single volume to protect against running out of Snaopshots on the destination.

Thus I've to change the option "pmMaxSvRelsPerSecondaryVol" to 250 and have PM create at least four volumes to get the job done.

If I'm correct than this option should not even allow me to set it to a value higher than 250 as it would never work.

regards, Niels

adaikkap
9,732 Views

Hi Niels,

     I tried this. Created a Dataset with Backup policy.

C:\>dfpm dataset list -x largeQtree

Id:                              362

Name:                            largeQtree

Protection Policy:               Back up

Application Policy:

Description:

Owner:

Contact:

Volume Qtree Name Prefix:

Snapshot Name Format:

Primary Volume Name Format:

Secondary Volume Name Format:

Secondary Qtree Name Format:

DR Capable:                      No

Requires Non Disruptive Restore: No

Node details:

   Node Name:           Primary data

   Resource Pools:      priRp

   Provisioning Policy: thinProvNas

   Time Zone:

   DR Capable:          No

   vFiler:

   Node Name:           Backup

   Resource Pools:      secRp

   Provisioning Policy:

   Time Zone:

   DR Capable:          No

   vFiler:

C:\>dfpm dataset list -m largeQtree

Id         Node Name            Dataset Id Dataset Name         Member Type                                        Name

---------- -------------------- ---------- -------------------- -------------------------------------------------- -------------------------------------------------------

       363 Primary data                362 largeQtree           volume                                             fas-sim-1:/largeQtree

       371 Backup                      362 largeQtree           volume                                             fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree

C:\>dfpm dataset list -R largeQtree

Id         Name                        Protection Policy           Provisioning Policy Relationship Id State        Status  Hours Source                       Destination

---------- --------------------------- --------------------------- ------------------- --------------- ------------ ------- ----- ---------------------------- ----------------------------

       362 largeQtree                  Back up                     thinProvNas                     375 snapvaulted  idle    0.1   fas-sim-1:/largeQtree/two    fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/two

       362 largeQtree                  Back up                     thinProvNas                     377 snapvaulted  idle    0.1   fas-sim-1:/largeQtree/four   fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/four

       362 largeQtree                  Back up                     thinProvNas                     379 snapvaulted  idle    0.1   fas-sim-1:/largeQtree/one    fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/one

       362 largeQtree                  Back up                     thinProvNas                     381 snapvaulted  idle    0.1   fas-sim-1:/largeQtree/three  fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/three

       362 largeQtree                  Back up                     thinProvNas                     383 snapvaulted  idle    0.1   fas-sim-1:/largeQtree/-      fas-sim-2:/largeQtree_backup_fasxsimx1_largeQtree/largeQtree_fas-sim-1_largeQtree

C:\>

snaplist on source volume after baseline.

fas-sim-1> snap list largeQtree

Volume largeQtree

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

20% (20%)    0% ( 0%)  Apr 19 05:45  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_largeQtree_fas-sim-1_largeQtree-src.0 (snapvault)

35% (22%)    0% ( 0%)  Apr 19 05:45  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_three-src.0 (snapvault)

45% (22%)    0% ( 0%)  Apr 19 05:44  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_four-src.0 (snapvault)

52% (22%)    0% ( 0%)  Apr 19 05:44  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_one-src.0 (snapvault)

58% (20%)    0% ( 0%)  Apr 19 05:44  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree_two-src.0 (snapvault)

fas-sim-1>

snaplist on the destination volume.

fas-sim-2> snap list largeQtree_backup_fasxsimx1_largeQtree

Volume largeQtree_backup_fasxsimx1_largeQtree

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

19% (19%)    0% ( 0%)  Apr 19 05:46  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree-base.2 (busy,snapvault)

fas-sim-2>

Snapvault status.

fas-sim-2> snapvault status

Snapvault is ON.

Source                                         Destination                                                                               State          Lag        Status

fas-sim-1:/vol/largeQtree/four                 fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/four                                Snapvaulted    00:03:15   Idle

fas-sim-1:/vol/largeQtree/-                fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/largeQtree_fas-sim-1_largeQtree     Snapvaulted    00:02:23   Idle

fas-sim-1:/vol/largeQtree/one                  fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/one                                 Snapvaulted    00:03:15   Idle

fas-sim-1:/vol/largeQtree/three                fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/three                               Snapvaulted    00:03:14   Idle

fas-sim-1:/vol/largeQtree/two                  fas-sim-2:/vol/largeQtree_backup_fasxsimx1_largeQtree/two                                 Snapvaulted    00:03:16   Idle

fas-sim-2>

Now I did a protect now.

fas-sim-1> snap list largeQtree

Volume largeQtree

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

26% (26%)    0% ( 0%)  Apr 19 05:58  dfpm_base(largeQtree.362)conn1.0 (snapvault,acs)

38% (20%)    0% ( 0%)  Apr 19 05:57  2012-04-20_0022+0530_daily_largeQtree_fas-sim-1_largeQtree_.-.four.one.three.two

fas-sim-1>

fas-sim-2> snap list largeQtree_backup_fasxsimx1_largeQtree

Volume largeQtree_backup_fasxsimx1_largeQtree

working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

18% (18%)    0% ( 0%)  Apr 19 05:59  fas-sim-2(0099931872)_largeQtree_backup_fasxsimx1_largeQtree-base.0 (busy,snapvault)

fas-sim-2>

So I am trying to understand what you are doing ? are you using QSM instead of SV only in that case each qtree requires 1 base snapshot.

Regards

adai

rmharwood
9,731 Views

I don't understand why having more than 250 qtrees would cause too many snapshots on either side, at least after the initial baseline - could you explain?

Thanks,

Richard

niels
9,731 Views

I still have to do some more testing, but in my environment it appears to be as follows:

- PM creates a SnapShot for every qtree on primary and secondary for initialization

- Volumes on both sides runs out of SnapShots once the 255th relationship is initiated

- PM fails to initialize any more relationships, the job is "partially failed" and the dataset non-conformant

Now I perform a "backup now" as suggested by Adai.

- PM fails to create the consolidated SnapVault SnapShot to coalesce all the previously created SnapShots per qtree because there are already 255 SnapShots present - catch 22.

By creating manual SnapShots before the initialization starts and delete them before "backup now", I could at least get PM to coalesce the SV SnapShots on the primary, but it doesn't do so for the secondary, thus it's running out of SnapShots again right away.

I assume PM is not performing the SnapShot coalescing on the secondary because it did not finish all relationships yet.

Testing is taking it's time as every action in NMC with this dataset takes ~10 minutes due to the ~1000 qtrees.

I get these nasty "...did not respond in 60 seconds" screen over and over.

regards, Niels

adaikkap
7,595 Views

Hi Niels,

     I suspect you are using QSM, as backup policy does QSM as well and only in case of QSM each qtree requires a base snapshot on the primary to support re-sync and fail over. If you could share the snapvault status and snap list for the source and destination volume it would help.

Regards

adai

niels
7,595 Views

Hi Adai,

it's definitely SV, not QSM.

Here the details and the output.

For better readability of this post I added most of it as attachments.

1. Created dataset and added volumes with 996 individual Qtrees.

Conformance results can be found in file "conformance_results_after_adding_source_volume.txt"

2. Relationships start to be created.

SnapShots on Secondary seem to be rolling over correctly.

I'm continuously seeing SnapShots like:

ernie(0118042218)_nr_1000qt_dst-base.10 (busy,snapvault)

ernie(0118042218)_nr_1000qt_dst-cleanup.0 (busy,snapvault)

and individual SnapShots for each running initialization:

ernie(0118042218)_nr_1000qt_dst_Marcel-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Clemens-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Ryan-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Malik-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Luan-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Domenic-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Milan-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Ferdinand-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Fritz-src.0 (snapvault)

ernie(0118042218)_nr_1000qt_dst_Lion-src.0 (snapvault)

...

3. SnapShots on primary pile up - see file "snapshots_primary.txt"

4. Job fails because primary volume runs out of SnapShots.

Conformance status: "Nonconformant"

See file "job_details"

5. SnapShots on Secondary are not cleaned up after job failed, in fact it seems that already new SnapShots are created for updating the relationships.

See file "snapshots_secondary.txt"

6. Conformance Results for Dataset after job has failed - see file "conformance_results_after_failed.txt"

7. Preview Conformance - see file "conformance_results_after_preview.txt""

The Conformance run will not be started right now.

8. Instead a 2Protect now" is performed as suggested earlier to coalesce the SnapShots.

This fails right away as PM initially tries to perform a "local backup", which I was not yet able to prohibit, although scheduling and retention for the local backups are disabled or set to "0":

C:\WINDOWS\system32>ssh zzlnxdfm dfpm job details 166746

Job Id:                    166746

Job State:                 completed

Job Description:           Still initializing

Job Type:                  on_demand_backup

Job Status:                failure

Bytes Transferred:         0

Dataset Name: NR_Catalog_Demo_1000QTrees

Dataset Id:                64725

Object Name: NR_Catalog_Demo_1000QTrees

Object Id:                 64725

Policy Name:               _NR Back up

Policy Id:                 59632

Started Timestamp:         24 Apr 2012 13:59:48

Abort Requested Timestamp:

Completed Timestamp:       24 Apr 2012 13:59:55

Submitted By:              niels

Job progress messages:

Event Id:      3701181

Event Status:  normal

Event Type:    job-start

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:48

Message:

Error Message:

Event Id:      3701182

Event Status:  normal

Event Type:    job-progress

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:53

Message:       Using naming format set in the dataset NR_Catalog_Demo_1000QTrees to generate the snapshot name.

Error Message:

Event Id:      3701183

Event Status:  normal

Event Type:    job-progress

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:53

Message:       Using naming format %T_%R to create the snapshot name for dataset NR_Catalog_Demo_1000QTrees

Error Message:

Event Id:      3701184

Event Status:  error

Event Type:    job-progress

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:53

Message:

Error Message: NR_Catalog_Demo_1000QTrees: Could not create snapshot for volume 'vf-nr-96:/nr_1000qt_src' (63722). Reason: No more snapshots available

Event Id:      3701185

Event Status:  error

Event Type:    snapshot-create

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:53

Message:       Failed to create snapshot.

Error Message:

Volume Id:     63722

Volume Name:   vf-nr-96:/nr_1000qt_src

Snapshot Name: 2012-04-24_1359+0200_daily

Event Id:      3701186

Event Status:  error

Event Type:    job-progress

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:55

Message:

Error Message: NR_Catalog_Demo_1000QTrees: Failed to create a local backup.

Event Id:      3701187

Event Status:  error

Event Type:    job-end

Job Id:        166746

Timestamp:     24 Apr 2012 13:59:55

Message:

Error Message:

9. SnapVault status output from source and destination - see files "snapvault_status_source.txt" and "snapvault_status_destination.txt"

Output truncated to only show relationships relevant to this problem.

I really have no clue how to solve this...

regards, Niels

Public