2008-08-22 06:33 AM
In the NetApp University class on VMWare the subject of misalignment is discussed. The infomration covererd about performance makes perfect sense. However, new information I hadn't heard before was also shared. Partners are being told that misalignment also affects Data Deduplication. If you review the attached lab manual and go to page 5-7 (PDF Page 123) you will see the following section.
The the two statements above lead to a lot of questions. The first point says it isn't complely clear what the impact is and then it goes on to show a customer who had a huge space savings due to it. Well, if that really was related to misalignment then it is pretty clear. Unless the point is that NetApp isn't really sure the misalignment was the issue.
While the statement alone would leave the reader to believe "it isn't clear" the instructor teaching the class made it seem much more like a fact that proper alignment helps deduplication.
To be clear I'm all for proper alginment for performance reasons. However, I have 5 or 6 cusotomers who have not aligned any of their VMs and each is seeing a huge space savings with data deduplication.
Keep in mind that migrating from misalignment LUNs to aligned LUNS requires moving all the data. A number of things could have changed during that process to affect deduplication.
In the end I have the following questions:
2011-08-17 12:48 PM
I didn't do an exhaustive study on this but I can tell you my experience. I have been told that Professional Services has a very well designed procedure any time they go into an alignment engagement - I haven't spoken to them but I can now tell you why they do with almost 100% certainty. I encourage you to use them if you have a lot of data to realign and you are using deduplication!
Now keep in mind my testing was done in a lab with no serious consequences for any outcome.
Here's what I did - I ran an mbralign script on a shared NFS data store from my 4 ESX hosts in a loop on each of the 4 ESX hosts. So I was basically attempting to deduplicate 400 hosts in 4 concurrent jobs. I knew at the outset I might be asking for trouble, but in a sense I was asking for trouble because I wanted to find out what would happen! I also knew that my Windows 2003 vmdk's were not properly aligned since they were cloned from a misaligned template. Before the script ran I checked my A-SIS stats and I was at 98% deduplication on that vol / data store. All of these hosts had been rapid cloned from a single template (you are using the Rapid Clone Utility right?) -- so irrespective of alignment there was huge deduplication at the outset.
As the mbralign scripts began to run I started to see my deduplication ratio drop precipitously (88% 76%...) and volume usage grow. Now keep in mind that mbralign keeps a copy of the original so I was not only changing alignment I was creating "new" vmdk files.
I thought to myself...well this is a test environment so it wouldn't be the end of the world if it ran out of space. Let's keep the scripts running and kick off a manual dedup run to hopefully free up some space. Went to lunch...came back... the volume was full, the mbralign scripts were choking. The dedup run had not produced the expected savings. I did some head scratching and then had an aha moment. I decided to then manually update my A-SIS fingerprint database on that volume and re-run the deduplication job -- BINGO. The space usage then started to drop quickly and I was able to continue with my mbralign jobs in a stair step fashion by alternating mbralign jobs with fingerprint update / deduplication runs to free up space that was lost in the alignment process. The fact that I had copies of all those vmdk's actually turned out to be neglible once this process was ironed out.
Immediately after I ran this entire process I was back to 98% (8.7 TB) space savings on the volume ...impressive considering all the "new" data I had created...but right where I started. So to me now in hindsight it only seems logical that this would be the case - the blocks didn't "change" so much as they were "shifted" or copied by mbralign - no big change from a block deduplication standpoint should have been expected. I only had to "re-teach" ONTAP about the new blocks to get the same savings I had been seeing (and this might have happened automatically over time had I had more patience...).
So -- lessons learned for me:
1) mbralign has a dramatic negative short term impact on deduplication / space usage, which can be reversed by the methods described above (fingerprint update / dedup / remove copies of the vmdk's)
2) The process that updates the A-SIS fingerprint database may involve some sort of batch process that doesn't happen immediately, so a manual update might be required if you need to realize quick space savings (this conflicts with reading I have done - so I'm not 100% on this - just reporting what I saw)
3) there are alot of good reasons to do an alignment project (performance and efficiency primarily) but space savings is not one of them.
4) my VM's performed better in subsequent tests than they did before in their misaligned state
5) if you are embarking on a large realignment project on a volume with A-SIS turned on either a) take it slow and test results as you go or b) call professional services!
BTW - All of this was done on ONTAP 7.3.x.
YMMV on ONTAP 8 and above.
Hope this helps.
2011-08-23 07:12 AM
1 - Yes - Misalignment will not always result in poor dedup ratio's. If we are talking about VM's, and they are all misaligned, the 4K blocks all look the same to dedup. So it is not unusual to see dedup work just fine with misaligned VM's.
2 - It may not at all, but it can, and has. A dedup of 64% is solid for a VM environment. You might get more aligned, no way of knowling this until alignment is run.
3 - Yep - that is why your ratio is good.
A few things I'll suggest -
For the future, we are providing disruptive alignment capabilities. In the near term you will be able to align VMFS datastores without disruption. NFS follows after and will require ONTAP 8.1.1.