Subscribe
Accepted Solution

dataset error after upgrade

Hello, we have recently upgraded our Protection Manager from dfm 4.0.2 to OnCommand 5.2.  We installed a fresh OS and made a backup of the old DFM and restored it to the new OnCommand server.  From what I can tell everything seems to be working fine.  However there is one dataset the takes snapshot backups to another Netapp that is failing.  A similar data set with the same relationship but with different volumes is successful.  I do not see much details in the logs about the failure except that there was an error. 

Here is a sample email of the error;

=======================================================================

An Error event at 15 Jun 17:39 PDT on Qtree nofs3_nofs3_users on Volume nofs3_backup on Storage System nofs0.ltx-credence.com:

SnapMirror Update: Failed.

Click below to see the details of this event.

http://dfm.milpitas.credence.com:8080/start.html#st=1&data=(eventID=140750)

*** Event details follow.***

General Information

-------------------

DataFabric Manager server Serial Number: 1-50-124130

Alarm Identifier: 2

Event Fields

-------------

Event Identifier: 140750

Event Name: SnapMirror Update: Failed

Event Description: SnapMirror Update

Event Severity: Error

Event Timestamp: 15 Jun 17:39

Source of Event

---------------

Source Identifier: 2930

Source Name: nofs0:/nofs3_backup/nofs3_nofs3_users

Source Type: Qtree

Name of the host: nofs0.ltx-credence.com

Type of the host: Storage System

Host identifier: 2858

Event Arguments

---------------

datasetId: 2915

backupJobId: 24050

jobId: 24050

--NetApp DataFabric Manager

=======================================================================

Here is a list of the snapshots in one of the destination volume for the dataset on the backup Netapp

  

NameDateUsedTotalStatus
2012-07-30 00:13:40 monthly_nofs0_nofs3_backup.-.nofs3_nofs3_usersJul 29 20:13123.7 GB263.6 GBnormal
2012-10-01 00:28:16 monthly_nofs0_nofs3_backup.-.nofs3_nofs3_usersSep 30 20:2826.99 GB139.9 GBnormal
2012-12-31 02:22:39 monthly_nofs0_nofs3_backup.-.nofs3_nofs3_usersDec 30 21:2224.16 GB113 GBnormal
2013-02-25 02:36:19 monthly_nofs0_nofs3_backup.-.nofs3_nofs3_usersFeb 24 21:3622.01 GB88.8 GBnormal
2013-04-01 02:37:02 monthly_nofs0_nofs3_backup.-.nofs3_nofs3_usersMar 31 22:3729.04 GB66.79 GBnormal
2013-05-13 02:40:20 weekly_nofs0_nofs3_backup.-.nofs3_nofs3_usersMay 12 22:4015.22 GB37.75 GBnormal
2013-05-20 03:00:57 weekly_nofs0_nofs3_backup.-.nofs3_nofs3_usersMay 19 23:005.924 GB22.52 GBnormal
2013-05-27 03:13:22 weekly_nofs0_nofs3_backup.-.nofs3_nofs3_usersMay 26 23:137.341 GB16.6 GBnormal
2013-06-03 03:06:24 weekly_nofs0_nofs3_backup.-.nofs3_nofs3_usersJun 02 23:064.364 GB9.259 GBnormal
2013-06-10 02:53:14 weekly_nofs0_nofs3_backup.-.nofs3_nofs3_usersJun 09 22:532.597 GB4.895 GBnormal
nofs0(0118060768)_nofs3_backup_nofs3_nofs3_users-dst.838Jun 12 22:052.063 GB2.298 GBbusy,snapmirror - Busy
nofs0(0118060768)_nofs3_backup_nofs3_nofs3_users-src.0Jun 14 20:16240.3 MB240.3 MBnormal

Any suggestions would be greatly appreciated.

Thanks,

- Marc

Re: dataset error after upgrade

Hi Marc,

     I don't see any correlations between this failure and upgrade. Can you also paste the output of the job details cli for this job id ?

dfpm job detail 24050

The error is basically coming from the storage systems which Protection manager is relying back.

Regards

adai

Re: dataset error after upgrade

Hello Adai,

The output of dfpm job detail 24050 is over 2000 lines.

If I grep for 'error' there are many error messages.  These are the only lines that had something after it.

Error Message:    
Event Status:      error
Error Message:     nofs0.ltx-credence.com: replication destination hard link create failed
Error Message:     LRD DIROPS
Event Status:      error
Error Message:     SnapMirror transfer failed.
Error Message:    

I'm not sure if that will be helpful to you.

Thanks!

- Marc

Re: dataset error after upgrade

Hi Marc,

          Pls redirect the output of the job and upload it as attachment.

Regards

adai

Re: dataset error after upgrade

Also, can you look thru the snapmirror logs on the target controller?

Re: dataset error after upgrade

Hello Adai,

Attached is the job detail.


Thanks for your time.

- Marc

Re: dataset error after upgrade

Hi Paul,

Here is the error message from the snapmirror log at the time of the dataset error.

slk Tue Jun 18 20:35:23 EDT state.qtree_softlock.nofs3_backup.0003ca71.028.nofs3_nofs3_users.src-qt.nofs0:/vol/nofs3_backup/nofs3_nofs3_users.00000000.-01.0 Softlock_delete (Transfer)
dst Tue Jun 18 20:35:24 EDT nofs3.ltx-credence.com:/vol/users/- nofs0:/vol/nofs3_backup/nofs3_nofs3_users Rollback_failed (replication destination hard link create failed)
dst Tue Jun 18 20:35:24 EDT nofs3.ltx-credence.com:/vol/users/- nofs0:/vol/nofs3_backup/nofs3_nofs3_users Abort (replication destination hard link create failed)

dst Tue Jun 18 20:55:25 EDT nofs3.ltx-credence.com:/vol/users2/- nofs0:/vol/nofs3_backup_2/nofs3_nofs3_users2 End (905892 KB)
dst Tue Jun 18 21:21:11 EDT nofs3.ltx-credence.com:/vol/users1/- nofs0:/vol/nofs3_backup_1/nofs3_nofs3_users1 End (2487300 KB)

Thanks,

- Marc

Re: dataset error after upgrade

Hi Marc,

          Sorry for the delay. I did some internal search and found a similar bug.  Here is the link to the same.

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=624459

Though it says snapvault, I guess you may have some relations because Qtree SnapMirror and SnapVault uses same replication engine as far as I know.

What is the version of ONTAP that you are running ?  I also suggest you to open a support case with NetApp for the same. This is a pure ontap error message and has nothing to

do with Protection Manager.

Regards

adai

Re: dataset error after upgrade

Hi Marc,

    I just confirmation from our folks internally that bug 624459 affects qtree snapmirror as well. Pls open a case with netapp and reference this bug to them.

Also to find the problematic file, follow the public report for bug 624459 in the link that I gave in my previous reply.

Regards

adai

Re: dataset error after upgrade

Hi Adai!

Thanks for the feedback and pointing me to the bug.  We are running Ontap 7.3.6 so this is probably it. 

I ran this command to locate all the hard links in the volume 'find . -type f -links +1 | xargs ls -i'  redirected the output to a file and sorted it.  I found over 400,000 hard links referencing a few files in someones home directory. 

I'll report back tomorrow after the job runs to see if it's successful

Thanks!

- Marc