ONTAP Discussions

Using "volume move" on cDOT8.3 to move large NFS volumes to another node non-disruptively

enttoobad
11,004 Views

Hi, anyone have any experience moving large, eg 2TB volumes to another node?  These volumes are NFS-3 datastores to vmware 5.5 and there are lots of VMs running on them.  I'm hoping I can just migrate the volumes without any noticeable impact on any VMs.

 

If anyone has experience good or bad please share.  I know this is possible but have no real life experience yet as we only just transitioned to cDOT.  We run 8.3,  and it's 4 node cluster of FAS3250s.

 

Thanks.

1 ACCEPTED SOLUTION

bobshouseofcards
10,942 Views

I've found volume moves work very well and are non-disruptive for the most part.  I have both NAS and SAN protocols in play and regularly move volumes of size - 8TB NFS ESX datastores, 25TB general file volumes accessed via CIFS and NFS, 8TB+ volumes with LUNs for database servers.  

 

The volume move engine of course puts additional load on the system and disk, so of course it can affect performance during the move.  When possible, it is best to do the moves when there is minimal other load on the target volume/system.  But, given certain sizes sometimes you have to run it for extended times.  I've watched a 40TB volume move take 12 days given other loads in the total system. 

 

Volume moves are limited per node in two ways - moves run at lower priorities, otherwise a single move running at full possible speed could starve other loads to the same aggregates or nodes.  Also - each node only gets so many volume move endpoint slots.  Both the source and the target count as a slot, so moves within a single node count two slots agains that node.  Thus you can queue up a number of moves if you need to and they will process in a measured fashion.  

 

As a standard data management mechanism, I have in our "NAS" cluster two nodes that basically are holding points for archive style data - loaded up with capacity MSATA but it's a small node pair.  Other nodes in the cluster store the active data on larger controllers and NL-SAS capacity disks.  Part of our standard operations is to move data back and forth between the two logical storage tiers within the cluster based on level of activity.  Typical weeks will see around 20TB of data shifted using this system (among multiple volumes).  It's also common to rebalance volumes onto different aggregates within a tier as variable size or I/O patterns arise.  All the moves take place against a background of around 1500 volumes total on this four node cluster with space efficiency, full SnapMirror replications, and lots of user access.

 

The one issue I've encountered is due to volume "container" size.  In our NAS cluster we have node types that have different max volume sizes.  What is not apparent is that WAFL has an underlying "container" mechanism that impacts the size.  As a volume grows, WAFL increases the size of the logical container.  The logical container size never shrinks (thin provisioning not withstanding) even if the actual data does.  The logical container is more a function of metadata and internal structures needed, and can reflect things like max files that might have been manually increased beyond the standard.  The kicker is that while we only had 26TB of real data (should fit anywhere), the volume source had grown to a 100TB capable container (max on the node in question).  Likely this was due to the actual user data being larger at some previous point and then shrinking back down.  Attempting to move that volume to a node with a 70TB volume limit didn't fail exactly, it just didn't go.  The volume move just stalled without doing anything.  It would show an error if you displayed all data, but it sat in the queue doing nothing.  It took a query in diagnostic mode APIs to pull the container size of the volume and confirm the issue.  The only way to move that volume was the old fashioned manual way - Robocopy.  Thankfully ODX enabled access allowed the data to move in a few days without killing the network.

 

Given your homogenous node cluster, and unless your datastores are under significant steady load, volume moves in cDot will be just fine.

 

Hope this helps.

 

View solution in original post

12 REPLIES 12
Public