VMware Solutions Discussions

VDI and dedup


Ame you have 200 virtual desktops: theoretical, with dedup I just need the size of 1. Cool. But what with the read cache or the PAM-cards? Do they know that those blocks are the same? Assume, all the 1000 clients need the same block: do I need 200 i/o of the same block (and I'll get a disk queue) on disk or does the filer know that all the different requests just need the same block what already is in the cache?







Abhinav reference to the blog post is excellent.  There is also another area to visit the detail around our intelligent cache.  That is a section of TR-3505 "NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide"

I often bring this up because our intelligent cache feature not only works with the PAM but with standard native cache in the array.  PAM simply enhances the amount of cache we have available to perform these operations.  It is important to review this document as there are instructions on VMware specific data which is a good candidate for deduplication vs. data which is not a good candidate.




VMware environments deduplicate extremely well. However, while working out the VMDK and data store layouts, keep the following points in mind:

Operating system VMDKs deduplicate extremely well because the binary files, patches, and drivers are highly redundant between virtual machines (VMs). Maximum savings can be achieved by keeping these in the same volume.

Application binary VMDKs deduplicate to varying degrees. Duplicate applications deduplicate very well; applications from the same vendor commonly have similar libraries installed and deduplicate somewhat successfully; and applications written by different vendors don't deduplicate at all.

Application data sets when deduplicated have varying levels of space savings and performance impact based on application and intended use. Careful consideration is needed, just as with nonvirtualized environments, before deciding to keep the application data in a deduplicated volume.

Transient and temporary data such as VM swap files, pagefiles, and user and system temp directories do not deduplicate well and potentially add significant performance pressure when deduplicated. Therefore NetApp recommends keeping this data on a separate VMDK and volume that are not deduplicated.

Data ONTAP 7.3.1 includes a performance enhancement referred to as warm cache extension for zero blocks. This is particularly applicable to VM environments, where multiple blocks are set to zero as a result of system initialization. These zero blocks are all recognized as duplicates and are deduplicated very efficiently. The warm cache extension enhancement provides increased sequential read performance for such environments, where there will be very large amounts of deduplicated blocks. Examples of sequential read applications that will benefit from this performance enhancement include NDMP, SnapVault, some NFS-based application, and dump. This performance enhancement is also beneficial to the boot-up processes in VDI environments.