ONTAP Discussions

VMWare volume, 16% space savings, should it be higher?

HendersonD
16,553 Views

I have a 2TB volume on a FAS3020 running OnTap 7.3.2. This volume is shared via NFS and is added as storage for my ESXi 4.1 hosts. I have 29VMs stored on this NFS share, a mixture of Win2003 and Win2008 servers. I have had dedup enabled on this volume for several months. At the command line of the filer I ran:

df -s /vol/VMWare

It says there is a 16% space savings. This seems low. I thought I had read other accounts, as well as Netapp marketing material, that claims space savings for VMWare in the 40-50% range. Any ideas?

I do take snapshots on this volume using SMVI but as I mentioned I have had dedup turned on for months and only keep a few weeks worth of snapshots.

1 ACCEPTED SOLUTION

datamanaged
16,929 Views

Hi Again Hendersond,

I'm not sure if my math is 100% correct here, but its not far from it.

Total volume usage - Snapshot Usage = Live Data

Dedupe Savings / (Live Data + Dedupe Savings) = Dedupe ratio excluding snapshots

For you:

1068 - 773 = 295

244 / (295 + 244) = 45.2% saved

The 19% you're seeing is based off of the 244GB of savings on 539GB of live data with an overhead of the snapshots 773GB (and for some reason it treats the snapshots as undeduped, when in reality they probably *are* deduped). If we factor snapshots into the above equation: 244 / (295+244+773) we get 19% after rounding.

We actually just noticed this with our DFM provisioning manager, whereas the CLI shows the dedupe saving including snapshots as just used storage, DFM seems to exclude snapshots from its computations. Our VSphere volume is ~42% saved in the CLI in DFM its abou 52% ( we have a very short backup schedule)

If you really want to tell, delete your snapshots(prolly not a good idea though). Alternatively, just try deleting a number of snapshots off the tail end (such as all of november if you don't need them). This might give you an idea of how the dedupe % will fluctuate.

Long story short, its a bit of bad math in df -s in combination with your snapshots.

I hope we've answered the conundrum at this point. If you still have problems our questions, let us know.


Best Regards,

Adam S.

View solution in original post

48 REPLIES 48

HendersonD
10,964 Views

filer2> df -h /vol/VMWare
Filesystem               total       used      avail capacity  Mounted on
/vol/VMWare/            2048GB     1068GB      979GB      52%  /vol/VMWare/
/vol/VMWare/.snapshot        0KB      773GB        0KB     ---%  /vol/VMWare/.snapshot

filer2*> df -sh /vol/VMWare
Filesystem                used      saved       %saved
/vol/VMWare/            1068GB      244GB          19%

I took the output of snap delta, pulled it into Excel, cleaned it up to make it more readable, and output a pdf. The pdf is attached

datamanaged
16,930 Views

Hi Again Hendersond,

I'm not sure if my math is 100% correct here, but its not far from it.

Total volume usage - Snapshot Usage = Live Data

Dedupe Savings / (Live Data + Dedupe Savings) = Dedupe ratio excluding snapshots

For you:

1068 - 773 = 295

244 / (295 + 244) = 45.2% saved

The 19% you're seeing is based off of the 244GB of savings on 539GB of live data with an overhead of the snapshots 773GB (and for some reason it treats the snapshots as undeduped, when in reality they probably *are* deduped). If we factor snapshots into the above equation: 244 / (295+244+773) we get 19% after rounding.

We actually just noticed this with our DFM provisioning manager, whereas the CLI shows the dedupe saving including snapshots as just used storage, DFM seems to exclude snapshots from its computations. Our VSphere volume is ~42% saved in the CLI in DFM its abou 52% ( we have a very short backup schedule)

If you really want to tell, delete your snapshots(prolly not a good idea though). Alternatively, just try deleting a number of snapshots off the tail end (such as all of november if you don't need them). This might give you an idea of how the dedupe % will fluctuate.

Long story short, its a bit of bad math in df -s in combination with your snapshots.

I hope we've answered the conundrum at this point. If you still have problems our questions, let us know.


Best Regards,

Adam S.

HendersonD
10,966 Views

So it appears that my dedup savings are really 42% and not the 19% shown using df -s due to the snapshots really not being counted correctly. Is that correct?

So if if were to delete all of my snapshots (like you said, not a good idea) would I see the output of the df -s command as 42%?

Other than just realizing that df -s is really not calculating correctly, is there any hope of this being fixed? I am running OnTap 7.3.2. Is there a possibility that this has been fixed in a newer version of OnTap?

I just deleted all of my November snapshots and then ran df -s on my VMWare volume. It is now 20%, I expected a bigger jump, especially if it is true that deleting all snapshots would result in df -s being 42%. Do I have to wait until dedup runs again this evening to realize bigger savings?

I just deduped this volume again and now df -s shows a 23% savings.

radek_kubka
10,964 Views
Ran dedupe, got savings of around 90%. I took a snapshot and then deleted the files.

I am talking about existing snapshots & how they are affected by dedupe:

1) snapshot(s) taken

2) dedupe run showing x MB / GB of savings (e.g. via df -s)

3) snapshot(s) grow by exactly the same value as dedupe 'savings'

4) when snapshot(s) eventually expire, savings become real

HendersonD
10,961 Views

That still leaves me with three unanswered questions:

  1. So it appears that my dedup  savings are really 42% and not the 19% shown using df -s due to the  snapshots really not being counted right. Is that correct?
  2. So if if were to delete all of my snapshots, would I see the output of the df -s command as 42%?
  3. Other  than just realizing that df -s is really not calculating correctly, is  there any hope of this being fixed? I am running OnTap 7.3.2. Is there a  possibility that this has been fixed in a newer version of OnTap?

I also have two CIFS shares that are showing dedup savings of 12% and 31% but my guess is that these figures are incorrect as well since these volumes also contain snapshots. Netapp must have many customers running dedup that are using the df -s command and getting savings percentages that are incorrect.

datamanaged
10,961 Views

Hi Hendersond,

I'll go ahead and answer these inline:

>>1. So it appears that my dedup  savings are really 42% and not the  19% shown using df -s due to the  snapshots really not being counted  right. Is that correct?

Yes, the math I provided earlier shows that to get 19% you have to count the snapshots as undeduped data. If you trust the math I provided on excluding that data, then your *live* data set should have a dedupe of about 42%.

>>2. So if if were to delete all of my snapshots, would I see the output of the df -s command as 42%?
Yes, I believe you would get pretty close to that number. Simply deleting a bit off the tail end of your snapshot list(think all of Nov) should bump the percentage up a bit.

>>3. Other   than just realizing that df -s is really not calculating correctly, is   there any hope of this being fixed? I am running OnTap 7.3.2. Is there  a  possibility that this has been fixed in a newer version of OnTap?

Its not that df -s is not calculating correctly. It is. It's just misleading, therefor I doubt there will be any change. The fact that it doesn't consider snapshot data as deduped isn't really a glitch. After thinking about it for my own environement... I would rather df -s include snapshots to throw off the %, this way I have a way of tracking just how my snapshots are affecting my deduped data.

Does that answer your questions?

Best Regards,
Adam S.

HendersonD
10,961 Views

My understanding, please correct me if I am wrong, is the snapshots should have dedup data in them. Shouldn't df -s count in snapshot data as well and give a more accurate estimate of dedup savings? When I look at the press releases, documentation, and blogs about Netapp dedup it is common to quote figures of 40% or more for dedup savings on VMWare volumes. I woiuld guess that nearly everybody has snapshots on their VMWare volumes so this 40% figure cannot really be authenticated unless someone does some calculations such as you have. I would think that snapshots should be counted in to give a true reflection of savings. I can't be the only customer who has red the documentation and listened to the marketing of dedup only to be surprised that the percentages quoted are not what I am seeing using df -s.

Another thought, perhaps a switch for df -s that takes into account snapshots. With this, the df command could be used to generate both percentages

HendersonD
10,961 Views

Just an update. I opended a case with Netapp about this on January 1st using the attached pdf to describe the issue.  Thanks to all, especially datamanaged for helping to define the parameters of this conundrum. So far the support engineer I have been working with has not been able to come up with the method that Netapp uses to calculate space savings. The Netapp documentation/blogs/case studies say that dedup on VMWare volumes should be in the 50 to 70% range but the only tool that Netapp provides to measure space saving is df -s which in my case shows a 20% space savings. I will let everyone know the outcome of the case.

Dave

HendersonD
11,031 Views

Netapp is now telling me that the problems is the Windows pagefile has not been moved to a separate disk on a separate volume. I do use a separate disk for my VMWare swap files for all of my VMs. I do not put the Windows pagefile on a separate disk, it is in its default location on the C drive. They are telling me a big chunk of the benefit of dedup is being lost in my configuration.

How many of you create a separate Netapp volume, add this to each of your ESX hosts, add a disk in all of your Windows VMs that uses this volume to store its files, and then move the Windows pagefile to this volume?

evilensky
10,842 Views

Always always always separate the data.  Our "c:\windows" drives are 90+% savings, our "data" drives are a separate disk in the VM and separate datastores and same for pagefiles.  Our pagefile dedupe rates are low but we make every effort to size VMs "correctly" so that even at highest load, paging is minimal or nonexistant.  For practical purposes, this is a tiny tiny fraction of our aggregate capacity and we believe leads to better performance.  RAM is cheap nowadays but not everyone is fortunate in such a world view.

Even when the "data drive volumes" dedupe in the 20-40% range we look at that as "free" useable space.

On mixed-data volumes where the machines are Pagefile, C:\windows, and system data on a single vmdk file, we can achive dedue of 55-60% simply because of the sheer number of small VMs that are mixed up in such volumes, so even this number isn't too dissappointing.

This is a good practice, a best practice, the correct practice certainly from a replication/snapshotting point of view, as well as dedupe.

YMMV...

Message was edited by: evilensky

evilensky
10,842 Views

And this has been helpful for us:  By separating the data and avoiding paging, we literally cut the "maginal cost" of storage for deploying a single virtual machine to just about zero.  We publicized this and our customers are happy; we compute the costs associated with VM deployment based solely on the ESX host now, and a separate charge for "Data." 

[Our system is set up to encourage customers to move the "data" portion of their workloads to our 2nd Tier, iSCSI platform which is attached directly using in-guest initators and has no depedencies on the system size of their VM].

HendersonD
9,975 Views

When you say "avoiding paging" do you mean sizing memory correctly so windows does not page to disk or do you mean separating the pagefile on a different disk? It seems that sizing memory correctly so the pagefile is not read/written would not neccessitate moving it to a separate disk. I know in the old days people use to move it to a separate disk in an effort at increasing performance. This is no longer necessary. I have sized all of my VMs memory so I avoid pageing to disk.

The other assumption I am making here is since I am not pageing in my Windows VMs that it is not the problem for low dedup savings.

evilensky
9,975 Views

I mean setting memory sizes correctly to be at or above your customers needs.

This does not obliviate the good practice of separating the pagefile.sys itself to a different disk.  One can't predict what one's customers might do, or what memory hungry/leaky programs they might run.  But in general we overcommit memory at the ESX host level and watch for hot guests that approach their memory allocations in such a way that put the host in danger of ESX swapping or ballooning.

It's all about knowing your environment's peaks, valleys, and averages.  We found that as long as everyone doesn't peak at once, we can get away with all kinds of over-provisionings...we are just the type of environment where given resources sit at idle most of the time; not unique to education but may not be for everyone...

Your pagefile.sys may not be 0 bytes just because you are not actively paging...look into that.

But unless we are talking many tens of gigabytes of worth of paging, you have some other issue or misconfiguration in your environment...your issue appears to be fairly unique...

Can you do a test by deploying some additional datastores the "correct" way with some "brand new" virtual machines and see what dedupe says for this?

You did run sis -s in order to dedupe existing data in a volume and not just turned "sis on" for new data?

HendersonD
9,973 Views

Many of my Windows VMs cannot use a separate "Data" disk since the application that gets installed wants to install on the C drive. Still, the majority of the space chewed up in a Windows VM is the Windows OS, not the data stored by the application. This of course is not true for applications like SQL or Exchange but these are always setup with a separate lun or disk for databases and logs.

If I have all of my VMs memory sized correctly and I do not use the Windows pagefile, why go through the effort of relocating it on a separate disk? In this case, how much can this be impacting dedup percentages? The df -s command is showing my VMWare volume space savings at 21%. All the Netapp documentation/blogs/case studies state dedup savings for a VMWare volume should be 50-70%. It seems that one of three things are true:

1. Having my Windows pagefile on the C drive significantly drives down the dedup savings. By moving it to a separate volume for my 30 Windows VMs, my dedup percentage will shoot up closer to or even above 50%

2. I am truly getting 21% space savings in which case the Netapp documentation/blogs/case studies are incorrect

3. The df -s command does not really measure space savings correctly. In this case Netapp should release a tool that customers can run that correctly calculates and displays space savings

If number 1 is true I will do the necessary work but at this point I am still skeptical it is the culprit.

HendersonD
9,973 Views

After working with Netapp support, they convinced me that the pagefile might be the culprit for low dedup savings. I have moved all of the pagefiles for my 30 Windows VMs to a different disk that is not part of the snapshot. By mid February I will know if this helped since I need to wait for snapshots that contain pagefile data to age off. I will report back at that time. If you are interested in the details of what I did, read on.

  1. Logged into about 8-10 of my Windows VMs to see how large the pagefile was. They ranged in size from 1.5 to 4GB. This was necessary so I could correctly size the disk I would be adding to each VM to hold the pagefile. I decided that a 10GB disk would be sufficient.
  2. Created a 200GB flexible volume on SATA disks called Pagefiles. Of course I did not enable dedup on this volume
  3. Turned off snapshots on this volume
  4. Made an NFS share based on this volume
  5. Added the share to my four ESXi hosts calling the datastore Pagefiles
  6. Downed each of my Windows VMs
  7. Added a 10GB disk to this VM:
    1. Choose Pagefiles as the datastore
    2. Since this is an NFS datastore the checkbox "Allocate and committ space on demand (Thin Provisioning)" is automatically checked
    3. Choose "Independent-Persistent" as the disk mode. This will keep VMWare from making a snapshot of this disk when SMVI runs against this VM
  8. Started up the VM, logged in, went into Disk Mangement and created a new volume giving it drive letter P and naming it Pagefile
  9. Located the pagefile on drive P, letting the system manage the size. Set drive C for no paging file
  10. Restarted the machine
  11. Logged back in and by showing hidden files and system protected files verified that I now see pagefile.sys on the root of the P drive and not on the root of the C drive

HendersonD
9,973 Views

Time to report back. I set my retention for snapshots to two weeks so I could see quickly the results of moving the pagefiles to a volume that was not part of the snapshot. Unfortunately, this did not help much. Before moving the pagefiles my dedup savings was 31%, afterwords 32%. This is with just two weeks worth of snapshots. This post started with me keeping 5 weeks worth of snapshots. My guess is if I increased the retention to 5 weeks and waited until I had 5 weeks worth of snapshots, my dedup savings would once again be in the low 20% range.

After conferring with my Netapp engineer who is working this case, he suggested two things:

1. Rerun the dedup scan to make sure that all exisiting data was being deduped, this did not help

2. Decrease the size of my VMWare volume. For the FAS3020 the largest volume that can be deduped is 2TB which is where mine was. I decreased it to 1.8TB, ran the dedup scan again, and nothing changed.

I then decided to take a very close look at all of my VMs, see the attached pdf. I first determined that Windows Server 2003 uses 3,187MB for the operating system while Server 2008 uses 8,191. For each of my VMs I wanted to find out how much space was being used for data other than the operating system. The column labeld Data MB is the answer. Out of my total 472GB consumed about 36% is used for the operating systems and about 64% is used for whatever application and data is stored on the VM.

The VMs that have the most data being used other than the OS are highlighted in yellow. When I look at the ones in yellow, several make sense. For example Web2 is my current IIS server. I store 30 websites on this VM so there is about 34GB of non OS data. Our accounting server is called Wincap and its 😧 drive is where the actual data is stored.

I have mixed feelings about this. On one hand, having nearly 2/3 of the data on the VMWare volume not related to the operating system could explain my low dedup savings. On the other hand, I cannot imagine we are unique in this regard. In other words, having some application data as part of the VM is common so this still may not explain my low dedup savings.

Thoughts?

evilensky
9,973 Views

hendersond wrote:

I have mixed feelings about this. On one hand, having nearly 2/3 of the data on the VMWare volume not related to the operating system could explain my low dedup savings. On the other hand, I cannot imagine we are unique in this regard. In other words, having some application data as part of the VM is common so this still may not explain my low dedup savings.

Thoughts?

NetApp dedupe is block level.  If and when your application data is divided into 4K WAFL blocks and each block is not 100% unique, it will not dedupe. 

For example, our Oracle database we are capable of deduping only 8% space savings, as each of our Oracle blocks is a relatively unique data structure.  However, this same oracle database data file is capable of being Gzipped to less than 1/4 of its original size, because the Data inside the Oracle data structures itself is fairly redundant in our environment.

That is just one example.

So you have 2/3 of the vmware volume which is from the netapp's dedupe perspective as unique data which cannot be deduplicated.

I think its great that you were finally able to track down the source of the problem.  Now you are aware of dedupe (as implemented by netapp) limitations and can get excited about Compression as introduced in 8.0.1.  This will give you further reason to separate your Virtual Machine data into different volumes per now-common best practices.

HendersonD
9,973 Views

There are three problems here though:

1. Many applications when installed on a server refuse to install on anything but the C: drive. The data saved by the application quite often cannot be put on anything but the C: drive. Yes, this is bad programming but a reality in many cases.

2. Even if an application on a server and the data from that application can be put on a different drive, what are the advantages? If the C: drive could be reserved for strictly the OS then the dedup on the volume holding all of this would be excellent. The dedup on the volume holding the application data would be lousy. How is that any better than having the application data on the same volume as the OS and getting so-so dedup? It seems like the overall space savings either way is about the same. The only advantage I can think of is if the application generates a large volume of data and the C: drive was sized incorrectly, there is the possibility of filling the drive which could crash the server. With proper alarms set in VMWare this should never happen.

3. All the Netapp documention, case studies, and blog posts state the dedup savings on a VMWare volume is 50-70%. They never mention that the only way to achieve this savings is to put all pagefiles and application data on separate volumes. My guess is these figures never looked at many production environments. This caveat should be stated right along side the 50-70% figure for full disclosure.

I am going to look at the VMs that have the most data on the C: drive and see if I can move that to another drive.

I am looking forward to compression. My hope is in about a year I can trade my FAS3020 filer heads for newer ones like 3200s. At that time I will be running OnTap 8.x. How much compression can be had?

Dave

evilensky
9,973 Views

1. Many applications when installed on a server refuse to install on anything but the C: drive. The data saved by the application quite often cannot be put on anything but the C: drive. Yes, this is bad programming but a reality in many cases.

Curious, what is the definition of "often" and "many"?

2. Even if an application on a server and the data from that application can be put on a different drive, what are the advantages? If the C: drive could be reserved for strictly the OS then the dedup on the volume holding all of this would be excellent. The dedup on the volume holding the application data would be lousy. How is that any better than having the application data on the same volume as the OS and getting so-so dedup? It seems like the overall space savings either way is about the same. The only advantage I can think of is if the application generates a large volume of data and the C: drive was sized incorrectly, there is the possibility of filling the drive which could crash the server. With proper alarms set in VMWare this should never happen.

Straw man.  Running out of space on your c:\ is a technical/management issue that has nothing to do with dedupe rates.  You aren't wrong that the rates on data can and often are lousy.  How is it better?  We moved much of our data to a much cheaper $/GB aggregate.  So our C:\ is always well performing while the data may in some cases not meet our top tier SLA.  But I'm only a customer speaking in-favor of my own environment.  You seem to arguing against flexibility in general.  Yes: It's extra work to do things correctly.  You should take pride in trying to better your own environment rather than complaining that it's the vendor's fault.

3. All the Netapp documention, case studies, and blog posts state the dedup savings on a VMWare volume is 50-70%. They never mention that the only way to achieve this savings is to put all pagefiles and application data on separate volumes. My guess is these figures never looked at many production environments. This caveat should be stated right along side the 50-70% figure for full disclosure.

Your use of "all the Netapp documentation..." and "never mentioned" is factually inaccurate.

http://media.netapp.com/documents/tr-3505.pdf -

5     DEDUPLICATION AND VMWARE

VMware environments deduplicate extremely well. However, while working out the VMDK and data store layouts, keep the following points in mind:

Operating system VMDKs deduplicate extremely well because the binary files, patches, and drivers are highly redundant between virtual machines (VMs). Maximum savings can be achieved by keeping these in the same volume.

Application binary VMDKs deduplicate to varying degrees. Duplicate applications deduplicate very well; applications from the same vendor commonly have similar libraries installed and deduplicate somewhat successfully; and applications written by different vendors don't deduplicate at all.

When deduplicated, application data sets have varying levels of space savings and performance impact based on application and intended use. Careful consideration is needed, just as with nonvirtualized environments, before deciding to keep the application data in a deduplicated volume.

Transient and temporary data such as VM swap files, pagefiles, and user and system temp directories do not deduplicate well and potentially add significant performance pressure when deduplicated. Therefore NetApp recommends keeping this data on a separate VMDK and volume that are not deduplicated.

Data ONTAP 7.2.6 and 7.3.1 introduce a performance enhancement referred to as intelligent cache. Although it is applicable to many different environments, intelligent caching is particularly applicable to VM environments, where multiple blocks are set to zero as a result of system initialization. These zero blocks are all recognized as duplicates and are deduplicated very efficiently. The warm cache extension enhancement provides increased sequential read performance for such environments, where there are very large amounts of deduplicated blocks. Examples of sequential read applications that benefit from this performance enhancement include NDMP, SnapVault, some NFS-based application, and dump. This performance enhancement is also beneficial to the boot-up processes in VDI environments.

The expectation is that about 30% space savings will be achieved overall. This is a conservative number, and in some cases users have achieved savings of up to 80%. The major factor that affects this percentage is the amount of application data. New installations typically deduplicate extremely well, because they do not contain a significant amount of application data.

HendersonD
9,835 Views

Points well taken. I do have at least three apps that come to mind that during the install process I am not given the choice of where to install. These are not big hitters in terms of placing a lot of data on the C drive. I will be creating another volume and start moving the VMs with the largest amount of application data off. For some of my VMs this should be easy. For example, moving the folder where websites are stored is not hard. For some of my applications it may mean a reinstall. Sorry I got a bit testy about this but I have chasing this dedup issue for months.

keitha
9,835 Views

I agree, something still seems odd. Most of my customers here up north do not seperate the OS and Data files and most are in the 60-70% savings range. So what is different about yours....

Wondering, did you P2V these VMs? Have you ever tried any sort of space reclaimation tool? Heres a theory, you seem to have several large VMs, I am wondering if the free space in your VMs isn't really free space and instead contains old data. Windows of course don't really delete anything, it just deletes the metadat for it, leaving the data and potentially hurting your dedupe. Now we don't normally have to do this, but your case is puzzling. I wonder if it wouldn't be worth a try to zero out the free space on some of your VMs. This will insure the free space is infact deduping as it should. I need to re read the thread but you did already confirm that the VMs are aligned?

Puzzling....

Public