2012-05-12 09:37 PM
I had a similar but not exact problem. The SV lag had grown to 26 hours. I also saw port errors and thought this issue was network or DNS-related because previously we had changed some ips. Those errors turned out to be other stuff. In the end the problem was time zones. The source filer was set for EST and the destination filer was set for GMT. After the timezone issue was resolved no more problems. Hope that helps.
2012-07-23 08:14 AM
I just passed by this thread by accident.
I have seen different causes for lag(note that I am replicating LUNs cross-dc, but that shouldn't differ).
It seems quite obvious that there is no connectivity issue.
This rule actually says it all!
Tue Apr 24 21:26:05 EDT syrna01: replication.dst.noSrcSnap:error: SnapVault: destination transfer from landoversvr.obg.corp:\Data to syrna01:/vol/winbackups2/landoversvr : base Snapshot copy for transfer no longer exists on the source.
You might have to read it twice.
What the filer does is create a snap on the source filer (primary) and when that is done it copies over all data at the moment of the snapshot.
This is a common state that both the secondary and the primary have and from which they can do "diffs"(so actually just another snapshot that is copied over).
If you wouldn't have a common starting point consistency over the 2 sides would never be possible.
In your case the base snapshot on the source actually got deleted(you can check with snap list <vol>).
The base snap should be marked with "snapvault". If you can't find any snapshot marked this way you don't have your base snap anymore.
Now that we know this, the question raises why the base snap got deleted.
This can be:
- user initiated (maybe someone with not enough knowledge or an accident)
- volume space management
Netapp has different features to ensure you keep having enough space on a volume. This can be done by using reservations or autogrow.
If for some reason the volume can't autogrow, depending on your settings the filer will start deleting snapshots first and try to free up space this way.
In the end this actually might delete ALL snapshots for that volume (I already had this a few times).
You can resolve this by restarting the sync. I don't think there is another way.
You can prevent this from happening in the future by using snapshot reservations, autogrowing the volume, ...
I hope this helps.
I am aware that this is a rather old thread, however it maybe helps you understand what went wrong and might help others as well.
2012-11-06 05:27 AM
This is the exact thing that I am seeing. The volume ran out of space and the filer has deleted the base snapshot. You say you can resolve this by restarting the sync - how do you do that? I'm not too familiar with snapvault.
Thanks in advance
2012-11-06 05:38 AM
With restarting I mean breaking the old snapvault relationship (snapvault stop ...) and creating it again (snapvault start -S ...)
This will create a new snapshot on the primary filer and then copy all data over to your secundary.
From the moment the data is copied over the snapvault relationship will work again as scheduled.
Make sure you have enough free space in your volume. If you just stop snapvault on the secundary all snapshots are kept.
So you either delete all those snapshots or you need enough space in the volume.
Make sure before starting the snapvault that all space is freed up on the volume using df -Vh <volume_name>
This might take a long time. However I am not sure if this has to do with sis(deduplication) or not.