I run xcp on a dedicated Win2016 VM.
That VM has 16 vcpus (4 sockets, 4 cores), 128GB RAM.
My origonal test case was when I was copying across a WAN so I had a need to use a lot of threads to get the most out of the WAN accounting for latency.
For me, 24 threads was a sweet spot. After a 74hour job (40million files/24TB) I found the server RAM was about 50% consumed and if I used threads over that it would fall over.
So you will have to play with that in your testing to find your sweet spot for the environment you are in.
Copy:
(initial copy)
xcp.exe copy -parallel 24 "\\source\share" "\\destination\share" >logfile.txt
Sync:
(initial sync)
xcp.exe sync -parallel 24 -acl "\\source\share" "\\destination\share" >logfile.txt
(day-to-day sync - this is faster)
xcp.exe sync -parallel 24 -nodata -noatime "\\source\share" "\\destination\share" >logfile.txt
(final sync before cut-over when users have been cutoff from the source)
xcp.exe sync -parallel 24 -acl "\\source\share" "\\destination\share" >logfile.txt
Note: the first and last sync commands check acl and are slower, so your cutover time needs to account for that.
There are many more commands and the user guide is very to-the-point so I strongly recommend referring to it. I also cant stress enough that your environment will have different variables to mine so should test to see what works for you.