VMware Solutions Discussions

Understanding DeDuplication & Thin Provisioning


We have just configured an FAS2552 (2 Node), Clustered mode, using NFS


Presented via 3 ESX 5.1 Hosts - all with NFS VAAI Plugin, VSC Installed.


It is all setup and currently trying to evaluate space savings etc before we make the storage live and a few questions have come up that i was hoping for help with.


In vSphere we have a Win 7 Template (80GB Disk, only 10GB is actually in use). Disk is Thick Provisioned Eager Zeroed. The template has also been sDeleted.

Netapp has a 600GB NFS Volume - Thin Provisioned - Dedupe enabled.


The bits i am struggling to understand:


1. Thin Provisioning

If i create 5x VMs based on the template (Total: 400GB Allocated, of which the OS using 50GB). If i look at the Volume it shows (before dedupe) that 400GB of the 600GB volume has been used. I am safe to assume the Volume doesn't thin provision because we are thick provisioning in vSphere? If i look at the Aggregate then i can see the increase is minimal which suggests thin provisioning is working.


I realise we are thick provisioning at VMWare level but i was told this would make no difference and the Array would handle the thin provisioning.

If this is all correct then is there a way i can report on thin provisioned values at the Volume level ? Unified Manager? Also, when a VM is deleted from a volume do i have to manually reclaim space or is there a process that will do this uatomatically?


2. Deduplication

As above. 5x VM's. All deployed from the same template. 32 bit Windows 7. (Total: 400GB Allocated, of which the OS using 50GB). 

Dedupe enabled and ran, saves me about 105GB. Actual used space on the Volume is then 300GB. So according to the Oncommand Sys Manager Stats, the 5x VM's are using 60GB each after dedupe. Surely this isnt right when they are identical VM's?


300GB Used in total, when at the OS level only 10GB is used (50GB across 5x VM's)


If i fill one of the hard disks up with data (roughly around 60GB), and then delete that data and rerun DeDupe, then weirdly it then shows an extra 30-40 GB available Space. So writing random data and dedupe is freeing up more space than when i just dedupe on 5 identical VM's.


I have run a reclaim space, i have sDeleted the template VM, all to try and zero out any unused space


If 70GB of an 80GB disk (VM level) is unwritten space, then i thought that would at least dedupe? Some of the whitepapers suggest it should:


"Operating system VMDKs deduplicate extremely well because the binary files, patches, and drivers are highly redundant between virtual machines (VMs). Maximum savings can be achieved by keeping these in the same volume. These VMDKs typically do not benefit from compression over what deduplication can already achieve. Further, since compressed blocks bypass the Flash Cache card, compressing the operating system VMDK can negatively impact the performance during a boot storm. For these reasons NetApp does not recommend adding compression to an operating system VMDK


NetApp includes a performance enhancement referred to as intelligent cache. Although it is applicable to many different environments, intelligent caching is particularly applicable to VM environments, where multiple blocks are set to zero as a result of system initialization. These zero blocks are all recognized as duplicates and are deduplicated very efficiently. The warm cache extension enhancement provides increased sequential read performance for such environments, where there are very large amounts of deduplicated blocks"


Again, am i missing something? This array was bought for VDI, at the moment im not seeing any real deduplication savings as each VM is using around 60GB. Any advice would be appreciated.







Couple of things that might help out:

  • if you are thick provisioning the VMDK's, they will consume all of the space you've provisioned to them within the volume - this is one of the features of the NFS VAAI plugin
  • if you have deduplication turned on, each cloned VM will just use pointer blocks to the actual data blocks on disk, freeing space within the volume after deduplication runs


Typically, you would combine thin provisioned VMDK's with thin provisioned NetApp volumes; this will give you the best space savings and has been shown to not cause any significant slow downs (there are many tests from many storage vendors showing this in the last couple of years); so there is typically no benefit seen by moving to an eager zeroed thick VMDK for your VM's.


If you're not seeing a huge deduplication savings in the volume, and all of the VMDK's are within the same volume, I would encourage you to open a support case for further investigation.  In the scenario you've described, you should be seeing only 70GB "on disk" (in the volume) plus a little overhead, after deduplication has finished running.


Thankyou for the response


I did some further tests yesterday and found thin on thin did provide decent savings - 81%. Which does make me think the issue is somehow that it cannot dedupe the zerod blocks when its thick on thin.


We have always provisioend this way, so maybe its habit now. Like you say, i have also read a few things saying there is little gain now other than the manage,ent headache that you have to be careful not to overcommit on teh volume and the Agg.


I will raise another support case too whilst this is at the setup stage. I have raised one already but i didnt get a clear confirmation either way really. Thanks again