Understanding DeDuplication and Thin Provisioning

[ Edited ]

I posted this yesterday but it may be better suited under this section, can anyone offer any advice ?


We have just configured an FAS2552 (2 Node), Clustered mode, using NFS. Ontap 8.2.2


Presented via 3 ESX 5.1 Hosts - all with NFS VAAI Plugin, VSC Installed.


It is all setup and currently trying to evaluate space savings etc before we make the storage live and a few questions have come up that i was hoping for help with.


In vSphere we have a Win 7 Template (80GB Disk, only 10GB is actually in use). Disk is Thick Provisioned Eager Zeroed. The template has also been sDeleted.

Netapp has a 600GB NFS Volume - Thin Provisioned - Dedupe enabled.


The bits i am struggling to understand:


1. Thin Provisioning

If i create 5x VMs based on the template (Total: 400GB Allocated, of which the OS using 50GB). If i look at the Volume it shows (before dedupe) that 400GB of the 600GB volume has been used. I am safe to assume the Volume doesn't thin provision because we are thick provisioning in vSphere? If i look at the Aggregate then i can see the increase is minimal which suggests thin provisioning is working.


I realise we are thick provisioning at VMWare level but i was told this would make no difference and the Array would handle the thin provisioning.

If this is all correct then is there a way i can report on thin provisioned values at the Volume level ? Unified Manager? Also, when a VM is deleted from a volume do i have to manually reclaim space or is there a process that will do this uatomatically?


2. Deduplication

As above. 5x VM's. All deployed from the same template. 32 bit Windows 7. (Total: 400GB Allocated, of which the OS using 50GB). 

Dedupe enabled and ran, saves me about 105GB. Actual used space on the Volume is then 300GB. So according to the Oncommand Sys Manager Stats, the 5x VM's are using 60GB each after dedupe. Surely this isnt right when they are identical VM's?


300GB Used in total, when at the OS level only 10GB is used (50GB across 5x VM's)


If i fill one of the hard disks up with data (roughly around 60GB), and then delete that data and rerun DeDupe, then weirdly it then shows an extra 30-40 GB available Space. So writing random data and dedupe is freeing up more space than when i just dedupe on 5 identical VM's.


I have run a reclaim space, i have sDeleted the template VM, all to try and zero out any unused space


If 70GB of an 80GB disk (VM level) is unwritten space, then i thought that would at least dedupe? Some of the whitepapers suggest it should:


"Operating system VMDKs deduplicate extremely well because the binary files, patches, and drivers are highly redundant between virtual machines (VMs). Maximum savings can be achieved by keeping these in the same volume. These VMDKs typically do not benefit from compression over what deduplication can already achieve. Further, since compressed blocks bypass the Flash Cache card, compressing the operating system VMDK can negatively impact the performance during a boot storm. For these reasons NetApp does not recommend adding compression to an operating system VMDK


NetApp includes a performance enhancement referred to as intelligent cache. Although it is applicable to many different environments, intelligent caching is particularly applicable to VM environments, where multiple blocks are set to zero as a result of system initialization. These zero blocks are all recognized as duplicates and are deduplicated very efficiently. The warm cache extension enhancement provides increased sequential read performance for such environments, where there are very large amounts of deduplicated blocks"


Again, am i missing something? This array was bought for VDI, at the moment im not seeing any real deduplication savings as each VM is using around 60GB. Any advice would be appreciated.





Re: Understanding DeDuplication and Thin Provisioning

did you start the dedup-process using sis -s ? This will rescan the whole volume and should give you a better dedup-ratio

Re: Understanding DeDuplication and Thin Provisioning

Hi, i actually used a full scan via the OnCommand System Manager, and then ran the normal scans each time i added VM's/made changes. I have also had the volume for a few days now too so it will be deduping according to schedule.


Does sis do anything differently or does System Manager just invoke the same commands ?

Re: Understanding DeDuplication and Thin Provisioning

Well, i think the System Manager-Command does the same as sis start -s ... I once tried a similar thing and got to a dedup-ratio of over 98% (200 Linux VMs, all created from the same template on a nfs-datastore), so it is possible and dedup works really good. Sorry, but i can't say why it's not working in your environment.

Re: Understanding DeDuplication and Thin Provisioning

Thats impressive, nowehere near what im getting!


Its as though it barely dedupes the Windows OS. Out of interest how were the disks provisioned at the host end ? Thin or thick ?

Re: Understanding DeDuplication and Thin Provisioning

Thin volume and thin vmdisk... and i think i also activated compression (not inline) for these tests. On our biggest production-datastore with lots of win2k3, win2k8 and linux machines we still experience a 40% dedup saving, which seems normal to me. Looking at other datastores i manage, a ratio of 30-50% shouldn't be a big problem, always depending on the content. Dedup is perfect for VDI and should give you way higher rates, no matter if it's Linux or Windows. In the end it's all about identical 4k blocks...

Re: Understanding DeDuplication and Thin Provisioning

Thanks for the feedback, i wondered whether the issue was us using Thick eager zeroed disks in VMWare. Though i was under the impression it shouldnt matter as the array is thin provisioned.


I found it strange that the more random data i wrote to the VM's the more available space it presneted back - you would think it would be at its peak when there are multiple identical machines