VMware Solutions Discussions

Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree

fletch2007
17,103 Views

Hi - we run a daily report using the mbrscan utility to check all vmdk files for alignment.

Recently (> 7.3.5?) Netapp added an nfsstat -d switch to report "Files Causing Misaligned IO's"

I am finding mbrscan is reporting vmdk's aligned: Yes

but nfsstat -d is reporting the same file's misaligned IO counter increasing

I zero'ed out the stats with nfsstat -z to be sure, and yes, the counters are increasing several 1000 between nfsstat -d runs (about 1 minute apart) in the case mcomm below

root@backup-02 mcomm]# /opt/netapp/santools/mbrscan *flat*vmdk

--------------------

mcomm_1-flat.vmdk p1 (EBR )    lba:64    offset:32768    aligned:Yes

mcomm_1-flat.vmdk e1 (NTFS)    lba:128    offset:65536    aligned:Yes

--------------------

mcomm-flat.vmdk p1 (NTFS)    lba:64    offset:32768    aligned:Yes

nfsstat -d output:

Files Causing Misaligned IO's

[Counter=3404], Filename=vm65/mcomm/mcomm-flat.vmdk

Which tool is correct?

FWIW - the Partial Write over limit (pwol) counter is not increasing:

http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html

Also nfsstat -d lists a record without a filename - how do I determine what this is ?

Files Causing Misaligned IO's

[Counter=4093], FSID=95966634, Fileid=21607809

thanks

11 REPLIES 11

chrism
17,070 Views

Bump.

We're seeing the exact same results as above on our end as well. nfsstat -d is flagging Windows 2K8 R2 boxes as generating misaligned IO whereas mbrscan says that everything is aligned. Being that I know that the 2008 R2 boxes are ok and that mbrscan is giving the results that I would expect, I'm not sure how to read or trust the nfsstat output.

fletch2007
17,070 Views

Friday 8pm we experienced a latency event (spike) which was logged by 30+ VMs

I've opened a case on this to see what role misaligned IO as reported by nfsstat -d is playing

thanks

fletch2007
17,070 Views

Well the Netapp engineer not so helpfully just emailed a link - completely ignoring the nfsstat -d output:

Please refer to the following knowledge base article link which shows how to identify and fix misaligned Windows Virtual Machine disks in your environment:

https://kb.netapp.com/support/index?page=content&id=1011402

Please let me know if you need any further assistance in this regard.

I just came across a "soon to come" teaser from http://www.vmdamentals.com/

"The devil is in the details: How aligned VMs may still be misaligned"

Sounds like our issue...

jcosta
17,070 Views

you can use vol read_fsid to find out which flexvol has that FSID

netapp01*> vol read_fsid customer01_oralogtemp

Volume 'customer01_oralogtemp' has an FSID of 0x17383d29.

Plus, you can have aligned VMs, but still have applications that generated non-aligned writes.

the example below is for an oracle database on NFS, all writes are aligned (it's NFS), but the writes to the redo logs can be for any size between 512b to ..... In this case, the writes can fall through a block boundary.

Files Causing Misaligned IO's

[Counter=0], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

[Counter=44876], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

[Counter=36977], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf[Counter=85835], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g14m2.dbf

[Counter=100507], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g14m1.dbf

[Counter=45690], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g12m1.dbf

[Counter=102382], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g13m1.dbf

[Counter=89142], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g13m2.dbf

[Counter=38241], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

[Counter=3373], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf

[Counter=3467], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

fletch2007
17,070 Views

Yes, so the GOS is aligned, but some operations may not be aligned (like Oracle logging)

Other Questions:

1) How do we guage the relative  significance of this unaligned IO?

2) Why are there multuple counters listed for the same file?

3) What do the values of the counters mean?

4) When should any action be taken on this data?

http://vmadmin.info

jcosta
17,070 Views

1) nfsstat -z, nfsstat -d over a 24hour period

2) no idea, possibly a bug

3)

              Files Causing Misaligned I/O's

                                  List  of  filenames  that   are

                                  causing   the  most  misaligned

                                  I/O's over NFS along with their

                                  corresponding  heuristic  coun-

                                  ters. The  higher  the  counter

                                  value,   the  higher  the  mis-

                                  aligned I/O  requests  for  the

                                  corresponding file.

4) are there any performance issues ? if not, then don't try to fix it,

larsson
17,070 Views

To answer number 2 (Why are there multuple counters listed for the same file?); this is because there's a counter for each cpu.

aborzenkov
17,070 Views

And what exactly these counters count?

larsson
17,070 Views

What I’ve heard is that they don’t mean anything to us regular users, those are just for engineering to understand how the algorithm works.

Sorry for the very late response on that one, but today was the first time I stumbled upon any info about it.

richard_mackerras
9,845 Views

I am so glad to find this discussion. IBM Support spat out a list from nfsstat -d with no context around it as if it is the root of all my problems.

The man page for nfsstat says 

 

Files Causing Misaligned I/O’s

List of filenames that are causing the most misaligned I/O’s over NFS along with their corresponding

heuristic counters. The higher the counter value, the higher the misaligned I/O requests for the

corresponding file.

http://psychology.about.com defines heuristic - A heuristic is a mental shortcut that allows people to solve problems and make judgments quickly and efficiently.

Therefore if the number is big it happens more or has a bigger impact, period. Avoid reading anything more into it.

The root of my problem probably lies elsewhere (spindles, cache, misalignment on the iSCSI luns, not NFS).

Thanks everyone, especially fletch2007 for raising the topic.

garymcleanuk
9,845 Views

BUMP..


My Colleague came across the same issue today... We are aligned in the OS but the under lying NFS says we are not.. nfsstat -d reporting "Files Causing Misaligned IO's"

Anyone got to the very bottom of this??

Public