Its not much to find out there because its part Art and part Science.
The maximum CSV size would depend on the maximum size LUN you can create(my system is capped at 16TB) on your Netapp.
But that doesn't mean you should just because you can.
How many controllers single or dual? 7-mode or Cluster mode? Are these production or DEV/QA VMs?
Assuming your SAN is dual controller running 7 mode and assuming the load on your SAN is already balanced.
I would do a minimum of two 5TB CSVs or a maximum of four 2.5TB CSVs.:
1 6TB Volume with one 5TB LUN on controller A(that extra 1TB is for snapshots if you are you them)
1 6TB Volume with one 5TB LUN on controller B(same as above)
Make sure to thin provision both the Volume and the LUN
The idea is to spread the load across 2 controllers to better utilize your SAN. If you start to grow, say to 110 VMs total, you would just expand the Volume and LUN to satisfy requirements.
To get the absolute best de-dupe, you would leave it as is, on that single 10TB CSV. But to better utilize your SAN, spread the load across both controllers.
For the four 2.5TB configuration:
I would do:
1 6TB Volume contacting two 2.5TB LUNs on controller A
1 6TB Volume containing two 2.5TB LUNs on controller B.
CSV 2.0 while better in Windows 2012, still rely on doing metadata updates, this is when VM1 hosted on HOST2, has its vhd(x) files on a CSV
owned by HOST1. Now you could minimize/eliminate that by:
Making sure all VMs on HOST1 have there VHDs stored on CSV1 owned by HOST1(you can see the csv owner in cluster manager>Storage>Disks)
Making sure all VMs on HOST2 have there VHDs stored on CSV2 owned by HOST2.
and so on.....
Also, make sure set your de-dupe jobs to run once a day, I find that running them once every 2,3,4 or 5 days makes the de-dupe process run longer because it has to de-dupe more change.
If you run it once a day its only de-duping data that has changed in the last 24 hours.
hope that helps