It appears "volume move" will cause massive data loss on large volume
3 weeks ago
We have been using "volume move" to shuffle volumes around our cluster, being told that it is a great tool, and will cut-over cleanly.
Well, I've found out that it isn't so clean. When "volume move start" is executed, it appears to create a snapshot for the copy. It then proceeds copying data from that snapshot, and including snapshots which existed prior to the "volume move start" snapshot. Once it is ready to trigger the cutover, it updates, I believe, from the origination snapshot, cuts-over the volume, but does not update files created or modified after the origination snapshot.
This has been wreaking havoc with our moves, with users telling us their data has reverted to a prior time; I now believe the users are correct.
-Unfortunately, I could not find any documentation or TR's which address the issue. So I must assume it is an issue with the volume move command.
-One caveat, is we did not have snapmirror licensed on the entire cluster. Perhaps that would cause "volume move" not to be able to update, however, there should have been a note in the documentation.
If anyone at NetApp can address this, that would be great. I'd like to know if "volume move" can be used in future, or if I need to go back to good old Volume SnapMirror.
20 REPLIES 20
If you're seeing this behavior I would open a ticket, that's not at all how vol move is suposed to work. I have moved 100s of volumes, including a lot of NFS and LUN based VMware DataStores with zero issue.
Give these a read over:
Are you seeing any errors? what happens after the move, are you having to restore?
Sorry, your links come up with unauthorized access.
No errors at all; the automatic triggers starts and completes.
However, viewing snapshots, the volum move snapshot is created at the time the "volume move start" is executed. It does not change. With Snapmirror, I find that all snapshots past the creation are locked, which tells me "vol move" is not using volume snapmirror.
That's weird, so am I now... sorry about that.
Give this a shot, it's the parent topic link.
A snapmirror (asynchronous) will just copy over what's on the snapshot it's created and call it a day.
a vol move is a little more complex. The cut-over part it similar to how a VM will cut over. It will copy over what it can and then stun for a short moment and copy off the remainder of the blocks.
Well, DataMotion, sounds like VMware. But beyond that, it still does not talk about snapshots and snapmirror requirements.
There should be a HUGE banner in front of this command, that apparently based on the link you sent, it will not work with NFS/CIFS shares.
- Actually, it works, with CIFS/NFS active
- It does not fail.
- It does not copy anything after the initial "volume move" snapshot is created
Snapmirror has nothing to do with vol move. They might use some of the similar mechanism under the hood, but they are separate.
Found more modern versions of the docs specifically on ONTAP (clustered):
but I asure you that moving volumes around on Clustered ONTAP is non-disruptive.
My P O I N T exactly.
If I am moving and deferring cutover for a volume that is 100TB's, and it is waiting for a week, I would expect to have multiple TB's of changes. If the process is not using snapmirror or snapshots, how can it update the state of the volume to the latest base image?
I am left to infer that it does not, and ergo the massive data loss our scientists are seeing when using "vol move".
Unless I see something from Engineering, stating otherwise, I am reverting back to SM's.
I realize this hasn't been asked yet.. what version of ONTAP are you on. vol moves between 7mode and Clusterd ONTOP are (slightly) different.
Also, if you are having data loss, please open a P1 ticket with support.
You should be seeing zero issues/loss with vol moves, CDOT/ONTAP vol moves are all non-disruptive. I would open a P1 to further investigate at this point.