Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hello, we have recently upgraded our Protection Manager from dfm 4.0.2 to OnCommand 5.2. We installed a fresh OS and made a backup of the old DFM and restored it to the new OnCommand server. From what I can tell everything seems to be working fine. However there is one dataset the takes snapshot backups to another Netapp that is failing. A similar data set with the same relationship but with different volumes is successful. I do not see much details in the logs about the failure except that there was an error.
Here is a sample email of the error;
=======================================================================
An Error event at 15 Jun 17:39 PDT on Qtree nofs3_nofs3_users on Volume nofs3_backup on Storage System nofs0.ltx-credence.com:
SnapMirror Update: Failed.
Click below to see the details of this event.
http://dfm.milpitas.credence.com:8080/start.html#st=1&data=(eventID=140750)
*** Event details follow.***
General Information
-------------------
DataFabric Manager server Serial Number: 1-50-124130
Alarm Identifier: 2
Event Fields
-------------
Event Identifier: 140750
Event Name: SnapMirror Update: Failed
Event Description: SnapMirror Update
Event Severity: Error
Event Timestamp: 15 Jun 17:39
Source of Event
---------------
Source Identifier: 2930
Source Name: nofs0:/nofs3_backup/nofs3_nofs3_users
Source Type: Qtree
Name of the host: nofs0.ltx-credence.com
Type of the host: Storage System
Host identifier: 2858
Event Arguments
---------------
datasetId: 2915
backupJobId: 24050
jobId: 24050
--NetApp DataFabric Manager
=======================================================================
Here is a list of the snapshots in one of the destination volume for the dataset on the backup Netapp
Any suggestions would be greatly appreciated.
Thanks,
- Marc
Solved! See The Solution
Hi Marc,
I just confirmation from our folks internally that bug 624459 affects qtree snapmirror as well. Pls open a case with netapp and reference this bug to them.
Also to find the problematic file, follow the public report for bug 624459 in the link that I gave in my previous reply.
Regards
adai
Hi Marc,
I don't see any correlations between this failure and upgrade. Can you also paste the output of the job details cli for this job id ?
dfpm job detail 24050
The error is basically coming from the storage systems which Protection manager is relying back.
Regards
adai
Hello Adai,
The output of dfpm job detail 24050 is over 2000 lines.
If I grep for 'error' there are many error messages. These are the only lines that had something after it.
Error Message:
Event Status: error
Error Message: nofs0.ltx-credence.com: replication destination hard link create failed
Error Message: LRD DIROPS
Event Status: error
Error Message: SnapMirror transfer failed.
Error Message:
I'm not sure if that will be helpful to you.
Thanks!
- Marc
Hi Marc,
Pls redirect the output of the job and upload it as attachment.
Regards
adai
Also, can you look thru the snapmirror logs on the target controller?
Hi Paul,
Here is the error message from the snapmirror log at the time of the dataset error.
slk Tue Jun 18 20:35:23 EDT state.qtree_softlock.nofs3_backup.0003ca71.028.nofs3_nofs3_users.src-qt.nofs0:/vol/nofs3_backup/nofs3_nofs3_users.00000000.-01.0 Softlock_delete (Transfer)
dst Tue Jun 18 20:35:24 EDT nofs3.ltx-credence.com:/vol/users/- nofs0:/vol/nofs3_backup/nofs3_nofs3_users Rollback_failed (replication destination hard link create failed)
dst Tue Jun 18 20:35:24 EDT nofs3.ltx-credence.com:/vol/users/- nofs0:/vol/nofs3_backup/nofs3_nofs3_users Abort (replication destination hard link create failed)
dst Tue Jun 18 20:55:25 EDT nofs3.ltx-credence.com:/vol/users2/- nofs0:/vol/nofs3_backup_2/nofs3_nofs3_users2 End (905892 KB)
dst Tue Jun 18 21:21:11 EDT nofs3.ltx-credence.com:/vol/users1/- nofs0:/vol/nofs3_backup_1/nofs3_nofs3_users1 End (2487300 KB)
Thanks,
- Marc
Hi Marc,
Sorry for the delay. I did some internal search and found a similar bug. Here is the link to the same.
http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=624459
Though it says snapvault, I guess you may have some relations because Qtree SnapMirror and SnapVault uses same replication engine as far as I know.
What is the version of ONTAP that you are running ? I also suggest you to open a support case with NetApp for the same. This is a pure ontap error message and has nothing to
do with Protection Manager.
Regards
adai
Hi Adai!
Thanks for the feedback and pointing me to the bug. We are running Ontap 7.3.6 so this is probably it.
I ran this command to locate all the hard links in the volume 'find . -type f -links +1 | xargs ls -i' redirected the output to a file and sorted it. I found over 400,000 hard links referencing a few files in someones home directory.
I'll report back tomorrow after the job runs to see if it's successful
Thanks!
- Marc
Hi Marc,
Good to know it helped. Let me know how the next update jobs goes.
Regards
adai
Hello Adai,
After removing the hard links I get the same error message. "replication destination hard link create failed" Do I need to re-initialize the the snap mirror or something else to get it going again?
Thanks,
- Marc
Hi Marc,
This is more of an ONTAP issue. I suggest you open a case against this bug and support should be able to help you. Sorry that I couldnt help you on this.
Regards
adai
Hi Marc,
I just confirmation from our folks internally that bug 624459 affects qtree snapmirror as well. Pls open a case with netapp and reference this bug to them.
Also to find the problematic file, follow the public report for bug 624459 in the link that I gave in my previous reply.
Regards
adai
Hello Adai,
Sorry for the late update. After removing the hard links I ended up having to remove the volume from the dataset, created a new dataset, and placed the volume in the newly created dataset. It's been working again.
Thanks,
- Marc
Hello Adai,
After correcting the hard links limitation error numerous times by deleting the hard links in the volumes I was hoping there was a better way to recover from this error. Currently I remove the hard links and then have to delete the dataset and create a new dataset job to start the backups over from scratch. Is there another method I can try to get the job working again without deleting the dataset and still work in the same backup volume? Some kind of refresh? The hard links are removed but unless I remove the dataset it still comes up with the hard links error.
Thank you for your time.
- Marc