We have been using "volume move" to shuffle volumes around our cluster, being told that it is a great tool, and will cut-over cleanly.
Well, I've found out that it isn't so clean. When "volume move start" is executed, it appears to create a snapshot for the copy. It then proceeds copying data from that snapshot, and including snapshots which existed prior to the "volume move start" snapshot. Once it is ready to trigger the cutover, it updates, I believe, from the origination snapshot, cuts-over the volume, but does not update files created or modified after the origination snapshot.
This has been wreaking havoc with our moves, with users telling us their data has reverted to a prior time; I now believe the users are correct.
-Unfortunately, I could not find any documentation or TR's which address the issue. So I must assume it is an issue with the volume move command.
-One caveat, is we did not have snapmirror licensed on the entire cluster. Perhaps that would cause "volume move" not to be able to update, however, there should have been a note in the documentation.
If anyone at NetApp can address this, that would be great. I'd like to know if "volume move" can be used in future, or if I need to go back to good old Volume SnapMirror.
If you're seeing this behavior I would open a ticket, that's not at all how vol move is suposed to work. I have moved 100s of volumes, including a lot of NFS and LUN based VMware DataStores with zero issue.
Sorry, your links come up with unauthorized access.
No errors at all; the automatic triggers starts and completes.
However, viewing snapshots, the volum move snapshot is created at the time the "volume move start" is executed. It does not change. With Snapmirror, I find that all snapshots past the creation are locked, which tells me "vol move" is not using volume snapmirror.
If I am moving and deferring cutover for a volume that is 100TB's, and it is waiting for a week, I would expect to have multiple TB's of changes. If the process is not using snapmirror or snapshots, how can it update the state of the volume to the latest base image?
I am left to infer that it does not, and ergo the massive data loss our scientists are seeing when using "vol move".
Unless I see something from Engineering, stating otherwise, I am reverting back to SM's.
"Once it is ready to trigger the cutover, it updates, I believe, from the origination snapshot, cuts-over the volume, but does not update files created or modified after the origination snapshot." --> this is not how it works.
Vol moves will sync up the source and destination for cutover. We don't use the original snapshot, because we'd have data loss, like you mentioned.
What could be happening here is that the cutover is slow or that the vol moves aren't completely done. As mentioned, a support case is your best bet to get to the bottom of what's happening here.
Once you've resolved the issue, if you could post back here with what the fix/root cause was, that would be useful. 🙂
my question would be, is the automated cutover failing and you are performing a manual cutover? I think others would be interested in the ultimate underlying cause and resolution of this case if you're able to share. Thanks
So several issues came to light through this excersize.
Our last vol move of a 100TB volume, on a tight aggregate, appeared to work; the volume was copied to a new aggregate and transfered. No errors were reported.
However, immediately on Monday after cutover (auto-triggered), users started reporting their files had reverted to an earlier version; The date was the date of the original volume move start operation.
There are several issues with troubleshooting this issue:
This is a restricted site, so no auto-supports go to NetApp
My server which triggered weekly auto-supports to email, had been turned-down (new one not up yet)
Log limit on the array's mroot removed original logs which applied to the trigger operation
Do not have a Syslog server to send "ALL" logs to; not sure that can be done either
Volume Move does not add to the "volume recovery-queue" so I cannot undelete the original source
I ran a test on a small volume, populated with 0 byte files in nested directories. I watched the snapshots and updates "volume move" made and they worked fine. The only difference between my troublesome moves and the test, was that the trouble was on volumes with dedicated aggregates, with minimal free space. (Don't ask why...) The same with the other two large volumes which had problems.
All of the smaller volumes which I had moved appear to have worked okay.
So my conclusion is that this happened because of the lack of free space, for one reason or another, but of course I can't prove it.
I would like to request the following from NetApp:
Please add an option to volume move to keep the original volume around if there is an issue (I forgot, in my case my backup was to a smaller model array, which has a smaller volume size.)
If anyone knows how to setup a network syslog type server which can keep all of the Ontap logs, please let me know.
In the final analysis, I had to ask that the case be closed, because I could not provide logs proving or disproving user allegations. I believe that volume move will work correctly, so long as there is enough space to do what it needs. Of course, this is all conjecture on my part, and I apologize if it is wrong. In the meantime, I've had to revert to SM, which appears to be a little slower than vol move.