ONTAP Discussions

VMWare volume, 16% space savings, should it be higher?

HendersonD
16,538 Views

I have a 2TB volume on a FAS3020 running OnTap 7.3.2. This volume is shared via NFS and is added as storage for my ESXi 4.1 hosts. I have 29VMs stored on this NFS share, a mixture of Win2003 and Win2008 servers. I have had dedup enabled on this volume for several months. At the command line of the filer I ran:

df -s /vol/VMWare

It says there is a 16% space savings. This seems low. I thought I had read other accounts, as well as Netapp marketing material, that claims space savings for VMWare in the 40-50% range. Any ideas?

I do take snapshots on this volume using SMVI but as I mentioned I have had dedup turned on for months and only keep a few weeks worth of snapshots.

1 ACCEPTED SOLUTION

datamanaged
16,914 Views

Hi Again Hendersond,

I'm not sure if my math is 100% correct here, but its not far from it.

Total volume usage - Snapshot Usage = Live Data

Dedupe Savings / (Live Data + Dedupe Savings) = Dedupe ratio excluding snapshots

For you:

1068 - 773 = 295

244 / (295 + 244) = 45.2% saved

The 19% you're seeing is based off of the 244GB of savings on 539GB of live data with an overhead of the snapshots 773GB (and for some reason it treats the snapshots as undeduped, when in reality they probably *are* deduped). If we factor snapshots into the above equation: 244 / (295+244+773) we get 19% after rounding.

We actually just noticed this with our DFM provisioning manager, whereas the CLI shows the dedupe saving including snapshots as just used storage, DFM seems to exclude snapshots from its computations. Our VSphere volume is ~42% saved in the CLI in DFM its abou 52% ( we have a very short backup schedule)

If you really want to tell, delete your snapshots(prolly not a good idea though). Alternatively, just try deleting a number of snapshots off the tail end (such as all of november if you don't need them). This might give you an idea of how the dedupe % will fluctuate.

Long story short, its a bit of bad math in df -s in combination with your snapshots.

I hope we've answered the conundrum at this point. If you still have problems our questions, let us know.


Best Regards,

Adam S.

View solution in original post

48 REPLIES 48

radek_kubka
9,338 Views

Hi,

How about misalignment? Did you check / fix it on 2003 VMs? (apparently 2008 takes care about alignment on its own)

It proved to be a culprit in some known cases, dramatically reducing de-dupe ratio.

Regards,
Radek

HendersonD
9,338 Views

Many months ago I went through and aligned all of my 2003 VMs. In the meantime I have been making a steady transition to Server 2008. Out of my 29 VMs, 11 of them are 2008 now. Should I recheck alignment on my 2003 VMs?

I aligned them when I was running ESX 4.0 and could run mbrscan and mbralign from the ESX console. I am now running ESXi 4.1 which has a very limited console. How do I check alignment under ESXi? I am running VSC 2.0.1, can that check alignment?

radek_kubka
8,861 Views

How do I check alignment under ESXi? I am running VSC 2.0.1, can that check alignment?

That's a good question on its own!

Have you seen this:

http://communities.netapp.com/click.jspa?searchID=371273&objectType=2&objectID=43696

kritchie
8,861 Views

Can you share the output of "sis status -l" and "snap list" for your NFS volume?

Just curious if dedupe is actually scheduled to run on a regular basis.  Also curious if there are any stale snapshots on the volume.

HendersonD
8,861 Views

Here is the output of sis status -l

Path:                    /vol/VMWare
State:                   Enabled
Status:                  Idle
Progress:                Idle for 15:46:36
Type:                    Regular
Schedule:                sun-sat@23
Minimum Blocks Shared:   1
Blocks Skipped Sharing:  0
Last Operation Begin:    Fri Dec  3 23:00:00 EST 2010
Last Operation End:      Fri Dec  3 23:43:14 EST 2010
Last Operation Size:     77 GB
Last Operation Error:    -
Changelog Usage:         0%
Checkpoint Time:         No Checkpoint
Checkpoint Op Type:      -
Checkpoint stage:        -
Checkpoint Sub-stage:    -
Checkpoint Progress:     -

Here is the output of snap list for the VMWare volume

Volume VMWare
working......

  %/used       %/total  date          name
----------  ----------  ------------  --------
  0% ( 0%)    0% ( 0%)  Dec 04 00:06  DRFiler1(0101198753)_VMWare.667 (snapmirror)
  0% ( 0%)    0% ( 0%)  Dec 04 00:04  smvi__ToDR_Snap_recent
  0% ( 0%)    0% ( 0%)  Dec 03 23:30  smvi_ToDR_NoSnap_novmsnap_recent
  0% ( 0%)    0% ( 0%)  Dec 03 00:03  smvi__ToDR_Snap_20101203000002
  0% ( 0%)    0% ( 0%)  Dec 02 23:30  smvi_ToDR_NoSnap_novmsnap_20101202233004
  0% ( 0%)    0% ( 0%)  Dec 02 00:03  smvi__ToDR_Snap_20101202000001
  0% ( 0%)    0% ( 0%)  Dec 01 23:30  smvi_ToDR_NoSnap_novmsnap_20101201233002
  0% ( 0%)    0% ( 0%)  Dec 01 00:03  smvi__ToDR_Snap_20101201000002
  0% ( 0%)    0% ( 0%)  Nov 30 23:30  smvi_ToDR_NoSnap_novmsnap_20101130233002
  0% ( 0%)    0% ( 0%)  Nov 30 00:03  smvi__ToDR_Snap_20101130000002
  0% ( 0%)    0% ( 0%)  Nov 29 23:30  smvi_ToDR_NoSnap_novmsnap_20101129233002
  0% ( 0%)    0% ( 0%)  Nov 29 00:30  smvi__Test_20101129003001
  0% ( 0%)    0% ( 0%)  Nov 29 00:04  smvi__ToDR_Snap_20101129000002
  0% ( 0%)    0% ( 0%)  Nov 28 23:30  smvi_ToDR_NoSnap_novmsnap_20101128233002
  0% ( 0%)    0% ( 0%)  Nov 28 17:07  smvi__Test_20101128170641
  0% ( 0%)    0% ( 0%)  Nov 28 16:58  smvi__Test_20101128165725
  0% ( 0%)    0% ( 0%)  Nov 28 16:52  smvi__Test_20101128165141
  0% ( 0%)    0% ( 0%)  Nov 28 13:58  smvi__Test_20101128135737
  0% ( 0%)    0% ( 0%)  Nov 28 13:47  smvi__Test_20101128134629
  0% ( 0%)    0% ( 0%)  Nov 28 13:13  smvi__Test_20101128131235
  0% ( 0%)    0% ( 0%)  Nov 28 09:59  smvi__Test_20101128095849
  0% ( 0%)    0% ( 0%)  Nov 28 09:51  smvi__Test_20101128095058
  0% ( 0%)    0% ( 0%)  Nov 28 00:04  smvi__ToDR_Snap_20101128000003
  0% ( 0%)    0% ( 0%)  Nov 27 23:30  smvi_ToDR_NoSnap_novmsnap_20101127233002
  0% ( 0%)    0% ( 0%)  Nov 27 00:03  smvi__ToDR_Snap_20101127000002
  0% ( 0%)    0% ( 0%)  Nov 26 23:30  smvi_ToDR_NoSnap_novmsnap_20101126233002
  0% ( 0%)    0% ( 0%)  Nov 26 00:03  smvi__ToDR_Snap_20101126000003
  0% ( 0%)    0% ( 0%)  Nov 25 23:30  smvi_ToDR_NoSnap_novmsnap_20101125233002
  0% ( 0%)    0% ( 0%)  Nov 25 00:03  smvi__ToDR_Snap_20101125000003
  0% ( 0%)    0% ( 0%)  Nov 24 23:30  smvi_ToDR_NoSnap_novmsnap_20101124233002
  0% ( 0%)    0% ( 0%)  Nov 24 00:03  smvi__ToDR_Snap_20101124000002
  0% ( 0%)    0% ( 0%)  Nov 23 23:30  smvi_ToDR_NoSnap_novmsnap_20101123233004
  0% ( 0%)    0% ( 0%)  Nov 23 00:03  smvi__ToDR_Snap_20101123000003
  0% ( 0%)    0% ( 0%)  Nov 22 23:30  smvi_ToDR_NoSnap_novmsnap_20101122233003
  0% ( 0%)    0% ( 0%)  Nov 22 00:03  smvi__ToDR_Snap_20101122000003
  0% ( 0%)    0% ( 0%)  Nov 21 23:30  smvi_ToDR_NoSnap_novmsnap_20101121233002
  0% ( 0%)    0% ( 0%)  Nov 21 00:03  smvi__ToDR_Snap_20101121000003
  0% ( 0%)    0% ( 0%)  Nov 20 23:30  smvi_ToDR_NoSnap_novmsnap_20101120233002
  0% ( 0%)    0% ( 0%)  Nov 20 00:04  smvi__ToDR_Snap_20101120000002
  0% ( 0%)    0% ( 0%)  Nov 19 23:30  smvi_ToDR_NoSnap_novmsnap_20101119233002
  0% ( 0%)    0% ( 0%)  Nov 19 00:07  smvi__ToDR_Snap_20101119000003
  0% ( 0%)    0% ( 0%)  Nov 18 23:30  smvi_ToDR_NoSnap_novmsnap_20101118233002
  0% ( 0%)    0% ( 0%)  Nov 18 00:06  smvi__ToDR_Snap_20101118000002
  0% ( 0%)    0% ( 0%)  Nov 17 23:30  smvi_ToDR_NoSnap_novmsnap_20101117233001
  0% ( 0%)    0% ( 0%)  Nov 17 00:09  smvi__ToDR_Snap_20101117000002
  0% ( 0%)    0% ( 0%)  Nov 16 23:30  smvi_ToDR_NoSnap_novmsnap_20101116233002
  0% ( 0%)    0% ( 0%)  Nov 16 00:03  smvi__ToDR_Snap_20101116000003
  0% ( 0%)    0% ( 0%)  Nov 15 23:30  smvi_ToDR_NoSnap_novmsnap_20101115233003
  0% ( 0%)    0% ( 0%)  Nov 15 00:03  smvi__ToDR_Snap_20101115000003
  0% ( 0%)    0% ( 0%)  Nov 14 23:30  smvi_ToDR_NoSnap_novmsnap_20101114233001
  0% ( 0%)    0% ( 0%)  Nov 14 00:03  smvi__ToDR_Snap_20101114000002
  0% ( 0%)    0% ( 0%)  Nov 13 23:30  smvi_ToDR_NoSnap_novmsnap_20101113233001
  0% ( 0%)    0% ( 0%)  Nov 13 00:03  smvi__ToDR_Snap_20101113000002
  0% ( 0%)    0% ( 0%)  Nov 12 23:30  smvi_ToDR_NoSnap_novmsnap_20101112233002
  0% ( 0%)    0% ( 0%)  Nov 12 00:02  smvi__ToDR_Snap_20101112000002
  0% ( 0%)    0% ( 0%)  Nov 11 23:30  smvi_ToDR_NoSnap_novmsnap_20101111233002
  0% ( 0%)    0% ( 0%)  Nov 11 00:03  smvi__ToDR_Snap_20101111000003
  0% ( 0%)    0% ( 0%)  Nov 10 23:30  smvi_ToDR_NoSnap_novmsnap_20101110233002
  0% ( 0%)    0% ( 0%)  Nov 10 00:02  smvi__ToDR_Snap_20101110000002
  0% ( 0%)    0% ( 0%)  Nov 09 23:30  smvi_ToDR_NoSnap_novmsnap_20101109233002
  0% ( 0%)    0% ( 0%)  Nov 09 00:03  smvi__ToDR_Snap_20101109000001
  0% ( 0%)    0% ( 0%)  Nov 08 23:30  smvi_ToDR_NoSnap_novmsnap_20101108233001
  0% ( 0%)    0% ( 0%)  Nov 08 00:02  smvi__ToDR_Snap_20101108000002
  0% ( 0%)    0% ( 0%)  Nov 07 23:30  smvi_ToDR_NoSnap_novmsnap_20101107233001
  0% ( 0%)    0% ( 0%)  Nov 07 00:04  smvi__ToDR_Snap_20101107000002
  0% ( 0%)    0% ( 0%)  Nov 06 23:30  smvi_ToDR_NoSnap_novmsnap_20101106233001
  0% ( 0%)    0% ( 0%)  Nov 06 00:02  smvi__ToDR_Snap_20101106000002
  0% ( 0%)    0% ( 0%)  Nov 05 23:30  smvi_ToDR_NoSnap_novmsnap_20101105233001
  0% ( 0%)    0% ( 0%)  Nov 05 00:03  smvi__ToDR_Snap_20101105000002
  0% ( 0%)    0% ( 0%)  Nov 04 23:30  smvi_ToDR_NoSnap_novmsnap_20101104233003
  0% ( 0%)    0% ( 0%)  Nov 04 00:03  smvi__ToDR_Snap_20101104000003
  0% ( 0%)    0% ( 0%)  Nov 03 23:30  smvi_ToDR_NoSnap_novmsnap_20101103233003
  0% ( 0%)    0% ( 0%)  Nov 03 00:03  smvi__ToDR_Snap_20101103000001
  0% ( 0%)    0% ( 0%)  Nov 02 23:30  smvi_ToDR_NoSnap_novmsnap_20101102233001
  0% ( 0%)    0% ( 0%)  Nov 02 00:02  smvi__ToDR_Snap_20101102000001
  0% ( 0%)    0% ( 0%)  Nov 01 23:30  smvi_ToDR_NoSnap_novmsnap_20101101233001
  0% ( 0%)    0% ( 0%)  Nov 01 00:02  smvi__ToDR_Snap_20101101000001
  0% ( 0%)    0% ( 0%)  Oct 31 23:30  smvi_ToDR_NoSnap_novmsnap_20101031233001
  0% ( 0%)    0% ( 0%)  Oct 31 08:30  smvi__ToDR_Snap_20101031082823
  0% ( 0%)    0% ( 0%)  Oct 31 07:33  smvi_ToDR_NoSnap_novmsnap_20101031073328

Here is the output of df -s for the volume VMWare

Filesystem                used      saved       %saved
/vol/VMWare/        1038750724  200134740          16%

I have Netapp's VSC 2.0.1 installed on our VirtualCenter server. It reports that the last time that dedup was run was October 28th. The sis status command shows that the last time it was run was last night. I have it scheduled to run each evening.

evilensky
8,861 Views

Are any of theses snapshots from before dedupe was enabled?

-----

FWIW, we have almost 3 TB of misaligned VMs deduping just fine up to 80%+ space savings...

HendersonD
8,861 Views

I have had dedup enabled for over 6 months so no, none of the snapshots are before dedup was turned on. This is puzzling, I was hoping for something in the 40-50% range. Is there a possibility that the df -s output is just incorrect and I really have great dedup?

pbhanu
9,338 Views

Hendersond,

To check the read/write alignment at the storage system, you can directly view the counters lun:unaligned_reads, lun:unaligned_writes (stats show).

You can even install DFM (4.0) and use Performance Advisor (Performance diagnosis feature) to check for lun mis-alignments.

Regards,

Bhanu

HendersonD
9,338 Views

In my case, I am using an NFS share (file level storage) not a LUN (block level storage) for my VM storage. My understanding is misalignment in my case can only happen at the guest operating system level and only with Windows Server 2003. Windows Server 2008 is always aligned. How can I check OS alignment for my Win2003 servers when I am running ESXi 4.1? I do not think any of them are misaligned but it is worth checking.

kritchie
8,641 Views

Here is a link to a blog post that provides a workaround for MBRscan on NFS volumes:

http://blogs.netapp.com/storage_nuts_n_bolts/2009/10/esxi---mbrscanmbralign.html

Are there any large datasets in the volume that comprise a significant portion of the volume?  Exchange datastores, SQL databases, etc?

HendersonD
8,641 Views

There are no large datastores on this volume that I can think of. My Exchange database and logs are on separate volumes (LUNs) and my SQL databases are also on separate volumes. I do have my Win2008 Server ISO sitting in this datastore, it is 3GB in size. I also have my two template files, one for WinServer 2003 and one for Server 2008.

As I mentioned earlier, 11 of the 29 VMs on this datastore are running Win2008 R2 so they are aligned. Months ago before upgrading from ESX 4.0 to ESXi 4.1, I used mbralign to align all of my Server 2003 VMs. I guess there is a possibility that a few are not in alignment but would that really cause dedup for my VMWare volume to be sitting at 16%?

Any other suggestion? Things I can dig into to solve this mystery?

keitha
8,164 Views

Do you happen to remember how you enabled Dedupe? If you just turned it on then it has only looked at the the data you placed into the volume since you turned it on. Not any of the VM's that were there already!! You would have had to run sis start -s  the -s is for scan which will look at existing data in the volume.

You could run that now and see if that improves the ratio but how it will manifest itself now is in the form of a large snapshot as those blocks will be held till the snap expires, then you will see the true savings.

Keith

radek_kubka
8,641 Views

Any other suggestion? Things I can dig into to solve this mystery?

Have you read this thread by any chance?

http://communities.netapp.com/message/14496#14496

In this particular case misalignement proved to be a culprit, indeed.

I am not saying there is 100% certainty it is your problem as well, but I am out of other clues to be honest with you.

HendersonD
8,641 Views

Time for an update. It took me a bit but I finally am able to handle  alignment for VMs under ESXi. I brought up a new VM running Ubuntu 32 bit, mounted my NFS VMWare datastore to it, and was able to use mbrscan  and mbralign. I found only two VMs that were not aligned. One of them I brought up a new Win2008 VM and transferred services, deleting the old  2003 misaligned VM entirely. The other misaligned VM is now aligned properly. My NFS datastore contains 31 VMs, 2 templates, and one WinServer 2008 R2 iso. This is the split for the 31 VMs:

  • 16 WinServer 2003, mix of standard and enterprise
  • 14 WinServer 2008 R2
  • 1 Ubuntu 32 bit used for mbrscan and mbralign

The only misaligned VM in the bunch is the Ubuntu one that I am using to run mbralign and mbrscan. I did not try to align it since it seemed tricky to align a Linux VM.

Several weeks ago I ran sis start -s once again just to make sure that I captured all of the data in the datatore for dedup purposes. My dedup runs each evening according  to schedule. The result of all of this is:

filer2> df -s /vol/VMWare
Filesystem                used      saved       %saved
/vol/VMWare/        1126198796  263421172          19%

I still sit at only 19% space savings. Two other thoughts I have. The volume I am using is named VMWare. I also have a qtree created inside this volume, it is also named VMWare. I added this storage to each of my  ESXi hosts using the full path to the qtree as shown in the attached file.

In retrospect, I am not sure why I created the qtree, it was not necessary. I could have just created the volume and presented it to ESXi hosts. Would having this qtree interfere with dedup?

The VMWare volume is also NOT thin provisioned. Would this have any impact on dedup?

At  this point I am almost ready to open a case with Netapp to see if one  of their engineers can solve this mystery. I am still hoping to get  dedup savings in the 40-50% range.

datamanaged
8,646 Views

A couple of ideas.

Just in case you missed it: On the 1 misaligned vm, did you delete the backup .vmdk after the realign was done and you confirmed the VM didn't have any problems? If that misaligned backup is still there it could be throwing off your numbers. Same with the other VM you removed, did you just remove it from vmware's inventory, or did you also make sure to delete it from disk? I'm really surprised that that many of your 2003 vm's were properly alligned, in my experiences with our infrastructure that usualy isn't the case, it'd be 2 out of 16 that *were* alligned properly (could just be our infrastructure though). I'd double check manually in the indvidual VMs just to be sure (the netapp allignment guide has instructions).

What kinds of data (above the OS level) do you have in these VMs? If you're hosting CIFS shares, large web directories or large databases, these could be dropping the dedupe %. It could also be filesystem cruft from deleted files, highly unlikely for that big a difference, but you could try zeroing the freespace and then storage vmotioning to thin format to re-thin the vmdks.

If you're saying the VMware volume on the filer is fully space reserved, then no, This shouldn't have any effect. And no, the qtree will not have any noticeable effect on dedupe.

Regards,

Adam S.

Edit: This is why you shouldn't try to post while tired. I apologize, but apparently I missed a couple of the earlier posts in this thread where you answered these questions... My bad.

HendersonD
8,975 Views

About 8 month ago I went through and aligned all of my VMs. In the process I did delete the two backup files that mbralign creates for each VM. You are correct, when I went through the first time, nearly all of my Win2003 VMs were not aligned. That is why in this round I only found 2 that were not aligned.

I am not hosting any CIFs shares or large databases in this volume. My SQL and Exchange databases are hosted in different volumes. I do have an IIS server but it is only hosts 23GB worth of sites.

One other thought I had is my SMVI jobs. SMVI takes a vmware snapshot of each vm, a filer level snapshot of the volume, and then deletes the vmware snapshots. There are some vms that do not like a vmware snapshot taken. Domain controllers and vms with iSCSI luns attached such as Exchange are examples. I run two SMVI jobs:

  • One job where a vmware snapshot is taken before the filer volume snapshot
  • One job where the vmware snapshot is not taken, just the filer volume snapshot

Most of my VMs are in the first job with just a handful in the second job. Here is the kicker though, all of these VMs are in the same datastore. In other words, the first job runs and I get a filer snapshot of my VMWare volume. The second job runs and I get another filer snapshot of the same VMWare volume. Of course before either of these filer snapshots are taken, my dedup job runs against this VMWare volume.

Would taking two snapshots each evening of the same NFS volume used for VMWare manifest itself in a low dedup percentage?

chriskranz
9,452 Views

Just out of interest, do you have any snapshots on this volume?

HendersonD
9,453 Views

I do have snapshots on this volume but I have been running dedup for months so any old snapshots have aged off already

radek_kubka
9,454 Views

My two cents:

Even if de-duped blocks are locked in snapshots, df -s counts them as savings.

Regards,
Radek

datamanaged
9,855 Views

Hi Radek,

I was curious about this, so I ran through a trial. I dropped several dozen copies of the same couple files onto a fresh test volume. Ran dedupe, got savings of around 90%. I took a snapshot and then deleted the files. df -s now shows 0% savings 0MB saved of the 22MB used by the snapshot (I agree that it shows the deduped size). Now I add in several dozen different files and dedupe. I now have df -s showing savings of 23% with 12mb saved. After deleting the snapshot (which is already dedupe 90%) my dedupe savings for the Volume jumps up to 40%(nearly double). This leads me to believe that snapshots are not counted in any fashion in the dedupe %, which is what hendersond is looking at.

Hendersond,

Can you post the output of df -h and df -sh for this volume?

Edit: Can you also attach the output of snap delta for this volume in a text file?

Public