Is there a best practice for CIFS Volume Size considering Backups, Snapshot, Snapmirror etc.

MAHESH1111111 · ‎2018-02-15

I understand that the Volume size limits are huge in latest versions of ONTAP (100TB) but from a realistic standpoint considering external factors like Backups, Snapshots, Snapmirror etc. that will impact for Bigger sized volumes - what is the recommended range for Volume Sizes for CIFS shares that will hold millions of files that belong to hundreds of Business teams.

If the Volume sizes are huge For Ex: greater than 10TB and if we are doing a daily incremental and weekly full backup that will impact backup scheduling plus restores will take longer, if there is snapmirror ongoing and due to some reason we have to perform the initial sync all over then you have 10TB data to sync which can take months due to limited WAN bandwidth also not imacting production replications/transfers during Business hours.

na-kennedy · ‎2018-02-15

It is going to depend on your circumstances, and from what you have said, you are already thinking about some of the variables that will be different from use case to use case. Certainly know that you won't get away from the same issues posed by every filesystem when dealing with millions of small files. So while you can have a single 50TB FlexVol with 100Mil files, if your use case means you need to crawl that filesystem regularly or you have a smaller, less reliable replication pipe, it might not be a good idea.

I am just today breaking up a large 40TB flexvol with 10's millions of files for exactly the reason of backup time constraints.

MAHESH1111111 · ‎2018-02-16

Yeah good thoughts! what is the tool you are planning to use for breaking that big volume (is it a CIFS/NFS ?) into smaller volumes for copying the data.

Our requirement is the CIFS Volume is touched often by hundred's of users througout the day for their day to day business.

na-kennedy · ‎2018-02-16

It's CIFS and I am using NetApps XCP tool. It's free (you have to re-up the free license every 3 months), but I found in testing it is signifantly faster than robocopy.

MAHESH1111111 · ‎2018-02-16

Okay, what a coincidence!! I started testing XCP since yesterday, looks like you have knowledge on using XCP - do you have any cheat sheet examples for copy, sync, verify commands that you can share please. I'm trying to test some scenarios and any help will be appreciable.

na-kennedy · ‎2018-02-16

I run xcp on a dedicated Win2016 VM.

That VM has 16 vcpus (4 sockets, 4 cores), 128GB RAM.

My origonal test case was when I was copying across a WAN so I had a need to use a lot of threads to get the most out of the WAN accounting for latency.

For me, 24 threads was a sweet spot. After a 74hour job (40million files/24TB) I found the server RAM was about 50% consumed and if I used threads over that it would fall over.

So you will have to play with that in your testing to find your sweet spot for the environment you are in.

Copy:

(initial copy)

xcp.exe copy -parallel 24 "\\source\share" "\\destination\share" >logfile.txt

Sync:

(initial sync)

xcp.exe sync -parallel 24 -acl "\\source\share" "\\destination\share" >logfile.txt

(day-to-day sync - this is faster)

xcp.exe sync -parallel 24 -nodata -noatime "\\source\share" "\\destination\share" >logfile.txt

(final sync before cut-over when users have been cutoff from the source)

xcp.exe sync -parallel 24 -acl "\\source\share" "\\destination\share" >logfile.txt

Note: the first and last sync commands check acl and are slower, so your cutover time needs to account for that.

There are many more commands and the user guide is very to-the-point so I strongly recommend referring to it. I also cant stress enough that your environment will have different variables to mine so should test to see what works for you.

MAHESH1111111 · ‎2018-02-16

Again thankyou very much, I started working with your commands and while trying the "sync" command with "-acl" flag it throws me the below error:

"-acl option cannot be used without -fallback-user and -fallback-group"

-fallback-user FALLBACK_USER	A user on the target machine to receive the permissions of local (nondomain) source machine users (example: domain\administrator).
-fallback-group FALLBACK_GROUP	A group on the target machine to receive the permissions of local (nondomain) source machine groups (example: domain\administrators).

I read the documentation about failback user and group but hard to understand what the actual meaning and benefit is.

In my case I’d like to migrate sub-folders from 7-mode Netapp (Source) to C-mode Netapp (Target), I want to ensure all the Permissions are copied to the target exactly as it is in the Source. So, in this migration scenario what should be the values for the failback arguments. I tried using just the “-acl” without the above arguments but it does not like that.