VMware Solutions Discussions
VMware Solutions Discussions
Hi - we run a daily report using the mbrscan utility to check all vmdk files for alignment.
Recently (> 7.3.5?) Netapp added an nfsstat -d switch to report "Files Causing Misaligned IO's"
I am finding mbrscan is reporting vmdk's aligned: Yes
but nfsstat -d is reporting the same file's misaligned IO counter increasing
I zero'ed out the stats with nfsstat -z to be sure, and yes, the counters are increasing several 1000 between nfsstat -d runs (about 1 minute apart) in the case mcomm below
root@backup-02 mcomm]# /opt/netapp/santools/mbrscan *flat*vmdk
--------------------
mcomm_1-flat.vmdk p1 (EBR ) lba:64 offset:32768 aligned:Yes
mcomm_1-flat.vmdk e1 (NTFS) lba:128 offset:65536 aligned:Yes
--------------------
mcomm-flat.vmdk p1 (NTFS) lba:64 offset:32768 aligned:Yes
nfsstat -d output:
Files Causing Misaligned IO's
[Counter=3404], Filename=vm65/mcomm/mcomm-flat.vmdk
Which tool is correct?
FWIW - the Partial Write over limit (pwol) counter is not increasing:
http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html
Also nfsstat -d lists a record without a filename - how do I determine what this is ?
Files Causing Misaligned IO's
[Counter=4093], FSID=95966634, Fileid=21607809
thanks
Bump.
We're seeing the exact same results as above on our end as well. nfsstat -d is flagging Windows 2K8 R2 boxes as generating misaligned IO whereas mbrscan says that everything is aligned. Being that I know that the 2008 R2 boxes are ok and that mbrscan is giving the results that I would expect, I'm not sure how to read or trust the nfsstat output.
Friday 8pm we experienced a latency event (spike) which was logged by 30+ VMs
I've opened a case on this to see what role misaligned IO as reported by nfsstat -d is playing
thanks
Well the Netapp engineer not so helpfully just emailed a link - completely ignoring the nfsstat -d output:
Please refer to the following knowledge base article link which shows how to identify and fix misaligned Windows Virtual Machine disks in your environment:
https://kb.netapp.com/support/index?page=content&id=1011402
Please let me know if you need any further assistance in this regard.
I just came across a "soon to come" teaser from http://www.vmdamentals.com/
"The devil is in the details: How aligned VMs may still be misaligned"
Sounds like our issue...
you can use vol read_fsid to find out which flexvol has that FSID
netapp01*> vol read_fsid customer01_oralogtemp
Volume 'customer01_oralogtemp' has an FSID of 0x17383d29.
Plus, you can have aligned VMs, but still have applications that generated non-aligned writes.
the example below is for an oracle database on NFS, all writes are aligned (it's NFS), but the writes to the redo logs can be for any size between 512b to ..... In this case, the writes can fall through a block boundary.
Files Causing Misaligned IO's
[Counter=0], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf
[Counter=44876], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf
[Counter=36977], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf[Counter=85835], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g14m2.dbf
[Counter=100507], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g14m1.dbf
[Counter=45690], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g12m1.dbf
[Counter=102382], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g13m1.dbf
[Counter=89142], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g13m2.dbf
[Counter=38241], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf
[Counter=3373], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf
[Counter=3467], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf
Yes, so the GOS is aligned, but some operations may not be aligned (like Oracle logging)
Other Questions:
1) How do we guage the relative significance of this unaligned IO?
2) Why are there multuple counters listed for the same file?
3) What do the values of the counters mean?
4) When should any action be taken on this data?
1) nfsstat -z, nfsstat -d over a 24hour period
2) no idea, possibly a bug
3)
Files Causing Misaligned I/O's
List of filenames that are
causing the most misaligned
I/O's over NFS along with their
corresponding heuristic coun-
ters. The higher the counter
value, the higher the mis-
aligned I/O requests for the
corresponding file.
4) are there any performance issues ? if not, then don't try to fix it,
To answer number 2 (Why are there multuple counters listed for the same file?); this is because there's a counter for each cpu.
And what exactly these counters count?
What I’ve heard is that they don’t mean anything to us regular users, those are just for engineering to understand how the algorithm works.
Sorry for the very late response on that one, but today was the first time I stumbled upon any info about it.
I am so glad to find this discussion. IBM Support spat out a list from nfsstat -d with no context around it as if it is the root of all my problems.
The man page for nfsstat says
Files Causing Misaligned I/O’s
List of filenames that are causing the most misaligned I/O’s over NFS along with their corresponding
heuristic counters. The higher the counter value, the higher the misaligned I/O requests for the
corresponding file.
http://psychology.about.com defines heuristic - A heuristic is a mental shortcut that allows people to solve problems and make judgments quickly and efficiently.
Therefore if the number is big it happens more or has a bigger impact, period. Avoid reading anything more into it.
The root of my problem probably lies elsewhere (spindles, cache, misalignment on the iSCSI luns, not NFS).
Thanks everyone, especially fletch2007 for raising the topic.
BUMP..
My Colleague came across the same issue today... We are aligned in the OS but the under lying NFS says we are not.. nfsstat -d reporting "Files Causing Misaligned IO's"
Anyone got to the very bottom of this??