ONTAP Discussions

OnCommand Dataset's and datalag

michaeldparker
8,509 Views

Using Host 1.0, I created a dataset with a protection policy that mirrors to a 2nd array then performs a backup once the mirror is complete.  When I created the dataset, OnCommand automatically created the relationships to make the dataset conformant.

The volume has 3 qtrees inside with a lun used for a vmware datastore inside each qtree.  Using the host plugin, I created the dataset so that the backup will backup each datastore.  Everything works but OnCommand automatically created 4 relationships between the mirror and backup like this :

ATLFAS01.acuitylightinggroup.com:/vol/VMDS_TEST1_mirror_CDCFAS01_vmds_test1/-   ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1  Snapvaulted    32:40:30   Idle

ATLFAS01.acuitylightinggroup.com:/vol/VMDS_TEST1_mirror_CDCFAS01_vmds_test1/q1  ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/q1                              Snapvaulted    11:14:58   Idle

ATLFAS01.acuitylightinggroup.com:/vol/VMDS_TEST1_mirror_CDCFAS01_vmds_test1/q2  ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/q2                              Snapvaulted    11:14:58   Idle

ATLFAS01.acuitylightinggroup.com:/vol/VMDS_TEST1_mirror_CDCFAS01_vmds_test1/q3  ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/q3                              Snapvaulted    11:14:58   Idle

When the backup occurs, all relationships get updated, but the first which is in bold.  As a result, I continually get errors about datalag.  How do I get OnCommand to automatically update that relationship as well?

Thanks

Michael

1 ACCEPTED SOLUTION

arunchak
6,141 Views

Hi Michael,

      You can create dummy lun and dummy VM in the root volume, attach this dummy datastore to your dataset, so the relationship should keep getting updated as and when the scheduled backup of the dataset runs and hence avoide lag.

 

      I don't see any harm in your method of creating a cron job for updating the relationships manually either.

   AFAIK, there should be no problem in having LUN on Qtree.

Let us know if you need any further information. We would be pleased to assist you.

Thanks,

  Arun

View solution in original post

18 REPLIES 18

michaeldparker
8,364 Views

As an update to this, since the first relationship is not updating, I decided I'd see what would happen if I tore down that relationship.  I did that, but as soon as I kicked off a backup, the dataset went into a comforming state and a job was kicked off to create the relationship agaion.  So, the relationship is there, but doesn't get updated by the backup process.

Thought?

Thanks

adaikkap
8,364 Views

Hi Michael,

     We update a relationship only if there is a backup version that contains the source qtree. In this case, the  root qtree probably does not have any VMware objects so it is not part of any backupversions. Thats the reason it not getting update.

The second behaviour of conformance is also expected, what i would suggest is relinquish the relationship from the dataset using the following cli.

[root@lnx~]# dfpm dataset relinquish help

NAME

    relinquish -- mark a relationship as external

SYNOPSIS

    dfpm dataset relinquish { [ <destination-volume-name-or-id> ] |

        [ <destination-qtree-name-or-id> ] }

DESCRIPTION

    The relationship will be marked as external. Source and

    destination objects are left unchanged.

[root@lnx~]#

After that to prevent it from creating relationship go to NMC and ignore the particular source qtree

Regards

adai

michaeldparker
8,364 Views

Thanks for the reply!  It makes absolute sense what you are saying, but I cannot figure out how to use the relinquish command.  Can you help me out with that.  And once I have done that, set it to ignored, can I then just do a snapvault stop to delete that relationship?

Thanks

Michael

adaikkap
8,364 Views

Hi Michael,

     dfpm dataset relinquish ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1.


And once I have done that, set it to ignored, can I then just do a snapvault stop to delete that relationship?

Yes.

Regards

adai

michaeldparker
8,364 Views

Ok, I thought it’d be something like that, but when I enter the below command, I get Error: Could not find volume or qtree ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1'. Reason: There is no host, aggregate, volume, qtree, resource group, or dataset named ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1.

Thanks

michaeldparker
8,364 Views

Hi,

Any further thoughts on this by chance?  I've been playing with the dfpm dataset relinquish command and everything I do yields the same error stating that there is no host, aggregate, volume, .....

Thanks for the help.

Michael

arunchak
8,364 Views

HI,

To relinquish a relationship you can give the destination volume name or its id (same wrt qtree).

For example:

I have the following relationship as seen in filer (snapmirror status):

sin.rtp.netapp.com:JPMC_DS03                                               seawasp:JPMC_DS03_mirror_sin_JPMC_DS03                    Snapmirrored   -22:12:55  Idle

suggesting that the destination volume should be JPMC_DS03_mirror_sin_JPMC_DS03.

Now from dfm:

[root@grafspree ~]# dfm volume list | grep JPMC_DS03_mirror_sin_JPMC_DS03

10565 seawasp:/JPMC_DS03_mirror_sin_JPMC_DS03 Flexible     32_bit     No

Relinquishing the relationship:

[root@grafspree ~]# dfpm dataset relinquish 10565

Relinquished relationship (10568) with destination JPMC_DS03_mirror_sin_JPMC_DS03 (10565).

So please check whether ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1 is getting listed in dfm by using dfm volume list command (or dfm qtree list command).

If it is getting listed, Also try to relinquish the relationship using volume id as shown above.

Thanks,

  Arun.

michaeldparker
8,377 Views

Thanks for the reply. So, as you have noted, this is the relationship that is causing me grief:

ATLFAS01.acuitylightinggroup.com:/vol/VMDS_TEST1_mirror_CDCFAS01_vmds_test1/- ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1 Snapvaulted 02:51:40 Idle

My mgmt. server is on Windows, but I have installed some Linux tools so that when I do this command: dfm volume list |grep -i vmds_test

I get this back:

23729 ATLFAS01:/VMDS_TEST1_mirror_CDCFAS01_vmds_test1 Flexible 64_bit No

23766 ATLFAS02:/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1 Flexible 64_bit No

23632 CDCFAS01:/vmds_test1 Flexible 64_bit No

As you have said, I want to relinquish the qtree so I would assume I need to actually see this:

ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1

If I try this exact command:

dfpm dataset relinquish ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1

then, I receive this:

Error: Could not find volume or qtree 'ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1'. Reason: There is no host, aggregate, volume, qtree, resource group, or dataset named ATLFAS02:/vol/VMDS_TEST1_backup_CDCFAS01_vmds_test1_1/VMDS_TEST1_CDCFAS01_vmds_test1.

If instead I do this: dfpm dataset relinquish 23766 (where 23766 is from the dfm volume list from above)

Then I at least see this error:

Error: Could not find relationship information: Multiple managed relationships

with destination '23766' found.

Is there a method to pull back the ID of this particular qtree that I want to relinquish? I have not been able to figure that part out yet.

Thanks for the help.

Michael

michaeldparker
8,377 Views

OK, I feel dumb now.  I was just able to relinquish the relationship.  I had seen the dfm volume list command and played with it earlier today thinking perhaps I could use the ID to relinquish.  I did not see the dfm qtree list command untill just now.  I had somehow overlooked it. 

So, I did dfm qtree list |grep -i vmds_test

This command did pull back the qtree I needed and the ID.  Using the ID, I have relinquished the relationship successfully.  Let me see if I can finish up the rest of what I need now.

Whew .... what a pain.  Thanks for your help.

So, the question I still have though is why did the OnCommand Host automatically create the relationship if I don't need it.  Is there a way to fix that?  I'll be setting up additional datasets and now that I think i have all working, it should be fine .... but I'd rather not have to mess the relinquish, ignore, etc if I don't have to.

Thanks

Michael

michaeldparker
7,571 Views

Next hurdle .. and maybe I'm just not waiting long enough.  My dataset is now nonconformant because it is wanting to re-create the reliquished relationship.  So, I assume that is where the ignore come into play.  When I go to unprotected data, I am not seeing the /VMDS_TEST1_backup_CDCFAS01_vmds_test1_1 volume and I assume I'm looking for a qtree that states non-qtree data.  I don't see any of that.  I'll check it back in half hour or so to see if it takes time to discover.  It has already been close to half hour now.

Thanks

Michael

michaeldparker
7,571 Views

At about an hour, I still don't see the qtree.  I would think that is plenty of time.  Another thing I just notied though is that no matter what I hightlight on the unprotected data qtree screen, my ignore button remains greyed out.  So, my thought is even if it ever shows up, I'll not be able to click on ignore.  Another thing I have noticed is that my dataset now only shows 3 backup relationships, but still shows nonconformant because it wants to re-create the problem relationship.

Thanks

Michael

arunchak
7,571 Views

Hi,

  From one of the BURT I could extract this information:

"The lag threshold can be triggered for a many reasons, including (but not

limited to):

Not having any valid secondary storage, so the relationships can't be created

Relationship creation failure

Access control problems (secondary can't talk to primary)

network issues

primary/secondary storage is down

"Protect now" will try to trigger an immediate backup/mirror job, which you

can then track in the jobs page - it should tell you why it fails."

But from the above posts, I can make out that your protection is happening properly without any errors right?

>>I continually get errors about datalag

Can you post the exact error you are getting? Or does it just show Lag error?

Also check under Notifications -> Events and let us know if you have any event generated for lag error and the details.

michaeldparker
7,571 Views

Hi,

Thanks once again for you help in this.  It is taking a lot more effort than I anticpated, but I really appreciate you sticking with it to help me!  Right now the dataset is all messed up, so I cannot show you the errors that I was getting, but I'll see if I can explain it all clearly.  Starting from the top, using system manager, I created 1 volume with 3 qtrees.  Inside each qtree is a lun which is given to my ESX servers as datastores.  For simplicity, we'll call it tstvol, with q1, q2, q3, lun1, lun2, lun3.  Lun 1 is inside q1 and so on.  Lun1 is called ds1 in vmware, lun2 is ds2, and lun3 is ds3.  Now in OnCommand, I set up a storage service that stated I wanted to protect the data with a mirror and then backup.  In the host plugin of VMware, I created a dataset called vmds_test1 and assigned it the storage service.  For the data to backup, I told it to backup datastores ds1, ds2, and ds3.  This created my dataset with a mirror from my source to my destination.  Then off my mirror desination, the dataset created 4 vault relationships.  The relationships were for: /vol/tstvol/-, /vol/tstvol/q1, /vol/tstvol/q2, and /vol/tstvol/q3.

On a nightly schedule, the backup was 100% complete with no errors every night.  After about 2 days, I started seeing that there was a datalag error.  When I expanded the error it showed that /vol/tstvol/- had not updated since I created the dataset.  I can drop to the command line and manually update the relationship.  This will cause the datalag error to go away untill it reaches the threshold again. 

So, on your suggestion, I finally was able to relinquish the /vol/tstvol/- relationship.  I went into unprotected data to mark it as ignored.  As of this moring, it is still not showing there.  My dataset now shows nonconformant because it wants to recreate the /vol/tstsvol/- relationship.  If I let it, it will recreated it and update it fine, but in two days I'll be back to datalag because q1,2, and q3 will update but not /vol/tstvol/-

I know I just wrote a lot, but I hope it is clear and this enables you to tell me what to do next because I'm at a loss.  I'm trying to get this working properly so that I can set up additonal backup relationships in the same manner to backup our entire VM environment.

Thanks

Michael

arunchak
7,571 Views

HI,

I apologize for the delay in response. I had to create the setup from scratch to reproduce this issue. Thanks a lot for the detailed explanation.

I set the lag error threshold to 2 hours and scheduled the updates and backups every hour. I have attached the lag error screenshot on the relationship.

The issue is because there is no transferable nonqtree data in /vol_test_1/- and hence once the threshold hits you are getting lag error.

Mirror relationship:

sin:vol_test           croc:vol_test_1                               Source         00:04:07   Idle

in the volume vol_test, I created a dummy lun, attached it as datastore, created a dummy VM, added the datastore to the same dataset and performed an ondemand backup. The lag error went off.

Further, you cannot add an empty datastore (without any VM’s) to the dataset, the backup would fail in that case. I had raised a BURT for that 504037. So it is a must to have a dummy VM.

I need to leave the schedule for a while and probably tomorrow will check if I hit the lag again.

Thanks for the patience. I will analyze further and get back to you, request you to hang in till then.

Thanks,

  Arun

michaeldparker
7,571 Views

Hi Arun,

No problem on the delay.  I really appreciate your help in resolving the error.  I am glad that you were able to understand the problem and duplicate it.

I had not thought about creating a small lun and dummy VM in the root of the volume.  I have been able to make the error go away by merely logging into the controller and manually updating the nonqtree relationship.  I had though that I could shedule a cron job or something of that nature to update this one relationship on a regular basis.  Do you see any harm in doing that until a permanent solution or bug fix is implemented?  I would think that ultimately, the longer term goal would be for OnCommand to not create the relationship since it is not required, or for OnCommand to update the relationship at the regular remote backup intervals.  I realize that the OnCommand Host has been pulled from downloads due to a problem with vSphere5.  Once the appropiate fixes are implemented for vSphere5 compatability and the Host program is available for download again, do you anticipate that one of the two solutions I propose would be implemented?  Lastly, is there a more appropiate way I should be creating and presenting my luns as datastores to VMware.  We do the 3 luns inside the qtrees so that we can gain some efficiencies of dedupe and we were told that NetApp recommended the use of qtree's.  If this is not the proper method to do things, I'd appreciate some feedback.

Thanks

Michael

arunchak
6,142 Views

Hi Michael,

      You can create dummy lun and dummy VM in the root volume, attach this dummy datastore to your dataset, so the relationship should keep getting updated as and when the scheduled backup of the dataset runs and hence avoide lag.

 

      I don't see any harm in your method of creating a cron job for updating the relationships manually either.

   AFAIK, there should be no problem in having LUN on Qtree.

Let us know if you need any further information. We would be pleased to assist you.

Thanks,

  Arun

michaeldparker
6,087 Views

Yes, that is exactly what I did.  I created a cron job to keep that relationship updated.  Works great now.

adaikkap
6,087 Views

Hi Michael,

     Once a dummy lun and vm are created there is no need for a cron job, OnCommand will take care of updating this relationship.

regards

adai

Public