outlining the performance impact from the Netapp point of view - but I did not see the impact quantified.
Since the process of alignment currently requires the VM to be down (integrate with storage vMotion please?!)
I decided to try and design a test from the VM's point of view of the impact of aligned vs not-aligned.
The setup (I did this on an otherwise quiesced lab system (Dell 1950 x 2 cluster running vSphere + Netapp 2020 (NFS Datastores)):
1) Take a misaligned Linux VM (as checked by mbrscan)
2) clone the VM
3) align the clone with mbralign
Now we have two linux VMs M(isaligned) and (A)ligned
I wanted a way to generate IO of varying sizes - I used this script:
[fcocquyt@lab-vm-01 ~]$ more generateIO.csh
set x=1 set bs=1024
while ( $bs < 9000 ) echo $bs while ( $x < 20 ) dd if=/dev/zero of=tstfile$x bs=$bs count=10240 sum tstfile$x @ x++ end rm tstfile* @ bs+=1024 set x = 1 end
What I found from repeated runs of this script on both M and A vms was the Misaligned VM took an average of 18% longer to run the same IO.
I also captured /usr/lib/vmware/bin/vscsiStats - but interestingly those numbers (latency and outStandingIOs for example) did not show the same result (it showed about the same average latency for M & A vms...
I welcome any and all comments on this analysis
One area: block size - I have a suspicion the blocksize has a big effect on the latency - while the script was stepping through the blocksizes I observed the throughput varying quite a bit.
But the finding of 18% impact is in line with my expectation for NFS datastores...
I generally tell people that you have a certain performance "ceiling" based on the filer head and/or # of spindles (more driven by spindle count usually). Misalignment just means that you'll hit that "ceiling" sooner than you would otherwise. Until you hit that ceiling you won't see a huge difference (although 18% is higher than I would have thought.....very good to know). Once you do hit that ceiling, it's the same as when you max out your backend disk I/O under any circumstances (i.e. things get very slow)....you're just going to get there faster than otherwise due to the impact of misalignment.
1.4 Counters that indicate Improper Alignment There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, “wp.partial_writes“, “pw.over_limit“, and “pw.async_read“ are indicators of improper alignment. The “wp.partial write“ is the block counter of unaligned I/O. If more than a small number of partial writes happen, then WAFL® will launch a background read. These are counted in “pw.async_read“; “pw.over_limit“ is the block counter of the writes waiting on disk reads.
This counter is not exposed via SNMP as standard, but can be trended as outlined here: