2009-09-21 09:42 AM
We are in the process of aligning our VMs on our NFS datastores - I read with interest the Netapp doc
outlining the performance impact from the Netapp point of view - but I did not see the impact quantified.
Since the process of alignment currently requires the VM to be down (integrate with storage vMotion please?!)
I decided to try and design a test from the VM's point of view of the impact of aligned vs not-aligned.
The setup (I did this on an otherwise quiesced lab system (Dell 1950 x 2 cluster running vSphere + Netapp 2020 (NFS Datastores)):
1) Take a misaligned Linux VM (as checked by mbrscan)
2) clone the VM
3) align the clone with mbralign
Now we have two linux VMs M(isaligned) and (A)ligned
I wanted a way to generate IO of varying sizes - I used this script:
[fcocquyt@lab-vm-01 ~]$ more generateIO.csh
while ( $bs < 9000 )
while ( $x < 20 )
dd if=/dev/zero of=tstfile$x bs=$bs count=10240
set x = 1
What I found from repeated runs of this script on both M and A vms was the Misaligned VM took an average of 18% longer to run the same IO.
I also captured /usr/lib/vmware/bin/vscsiStats - but interestingly those numbers (latency and outStandingIOs for example) did not show the same result (it showed about the same average latency for M & A vms...
I welcome any and all comments on this analysis
One area: block size - I have a suspicion the blocksize has a big effect on the latency - while the script was stepping through the blocksizes I observed the throughput varying quite a bit.
But the finding of 18% impact is in line with my expectation for NFS datastores...
2009-09-23 09:29 PM
Very handy info -- thanks much for posting.
I generally tell people that you have a certain performance "ceiling" based on the filer head and/or # of spindles (more driven by spindle count usually). Misalignment just means that you'll hit that "ceiling" sooner than you would otherwise. Until you hit that ceiling you won't see a huge difference (although 18% is higher than I would have thought.....very good to know). Once you do hit that ceiling, it's the same as when you max out your backend disk I/O under any circumstances (i.e. things get very slow)....you're just going to get there faster than otherwise due to the impact of misalignment.
2010-07-23 01:14 PM
Turns out the impact of misaligned VMs can be more dramatic if the level of unaligned IO tips ONTAP into synchronous mode.
One indicator is the pw.over_limit stat -
1.4 Counters that indicate Improper Alignment
There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, “wp.partial_writes“, “pw.over_limit“, and “pw.async_read“ are indicators of improper alignment. The “wp.partial write“ is the block counter of unaligned I/O. If more than a small number of partial writes happen, then WAFL® will launch a background read. These are counted in “pw.async_read“; “pw.over_limit“ is the block counter of the writes waiting on disk reads.
This counter is not exposed via SNMP as standard, but can be trended as outlined here: