Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree

fletch2007 · ‎2011-06-20

Hi - we run a daily report using the mbrscan utility to check all vmdk files for alignment.

Recently (> 7.3.5?) Netapp added an nfsstat -d switch to report "Files Causing Misaligned IO's"

I am finding mbrscan is reporting vmdk's aligned: Yes

but nfsstat -d is reporting the same file's misaligned IO counter increasing

I zero'ed out the stats with nfsstat -z to be sure, and yes, the counters are increasing several 1000 between nfsstat -d runs (about 1 minute apart) in the case mcomm below

root@backup-02 mcomm]# /opt/netapp/santools/mbrscan *flat*vmdk

--------------------

mcomm_1-flat.vmdk p1 (EBR ) lba:64 offset:32768 aligned:Yes

mcomm_1-flat.vmdk e1 (NTFS) lba:128 offset:65536 aligned:Yes

--------------------

mcomm-flat.vmdk p1 (NTFS) lba:64 offset:32768 aligned:Yes

nfsstat -d output:

Files Causing Misaligned IO's

[Counter=3404], Filename=vm65/mcomm/mcomm-flat.vmdk

Which tool is correct?

FWIW - the Partial Write over limit (pwol) counter is not increasing:

http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html

Also nfsstat -d lists a record without a filename - how do I determine what this is ?

Files Causing Misaligned IO's

[Counter=4093], FSID=95966634, Fileid=21607809

thanks

chrism · ‎2011-06-22

Bump.

We're seeing the exact same results as above on our end as well. nfsstat -d is flagging Windows 2K8 R2 boxes as generating misaligned IO whereas mbrscan says that everything is aligned. Being that I know that the 2008 R2 boxes are ok and that mbrscan is giving the results that I would expect, I'm not sure how to read or trust the nfsstat output.

fletch2007 · ‎2011-06-28

Friday 8pm we experienced a latency event (spike) which was logged by 30+ VMs

I've opened a case on this to see what role misaligned IO as reported by nfsstat -d is playing

thanks

fletch2007 · ‎2011-07-01

Well the Netapp engineer not so helpfully just emailed a link - completely ignoring the nfsstat -d output:

Please refer to the following knowledge base article link which shows how to identify and fix misaligned Windows Virtual Machine disks in your environment:

https://kb.netapp.com/support/index?page=content&id=1011402

Please let me know if you need any further assistance in this regard.

I just came across a "soon to come" teaser from http://www.vmdamentals.com/

"The devil is in the details: How aligned VMs may still be misaligned"

Sounds like our issue...

jcosta · ‎2011-07-04

you can use vol read_fsid to find out which flexvol has that FSID

netapp01*> vol read_fsid customer01_oralogtemp

Volume 'customer01_oralogtemp' has an FSID of 0x17383d29.

Plus, you can have aligned VMs, but still have applications that generated non-aligned writes.

the example below is for an oracle database on NFS, all writes are aligned (it's NFS), but the writes to the redo logs can be for any size between 512b to ..... In this case, the writes can fall through a block boundary.

Files Causing Misaligned IO's

[Counter=0], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

[Counter=44876], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

[Counter=36977], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf[Counter=85835], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g14m2.dbf

[Counter=100507], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g14m1.dbf

[Counter=45690], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g12m1.dbf

[Counter=102382], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g13m1.dbf

[Counter=89142], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g13m2.dbf

[Counter=38241], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

[Counter=3373], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf

[Counter=3467], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

fletch2007 · ‎2011-07-04

Yes, so the GOS is aligned, but some operations may not be aligned (like Oracle logging)

Other Questions:

1) How do we guage the relative significance of this unaligned IO?

2) Why are there multuple counters listed for the same file?

3) What do the values of the counters mean?

4) When should any action be taken on this data?

http://vmadmin.info

jcosta · ‎2011-07-04

1) nfsstat -z, nfsstat -d over a 24hour period

2) no idea, possibly a bug

3)

Files Causing Misaligned I/O's

List of filenames that are

causing the most misaligned

I/O's over NFS along with their

corresponding heuristic coun-

ters. The higher the counter

value, the higher the mis-

aligned I/O requests for the

corresponding file.

4) are there any performance issues ? if not, then don't try to fix it,

larsson · ‎2011-09-07

To answer number 2 (Why are there multuple counters listed for the same file?); this is because there's a counter for each cpu.

aborzenkov · ‎2011-09-08

And what exactly these counters count?

larsson · ‎2011-11-09

What I’ve heard is that they don’t mean anything to us regular users, those are just for engineering to understand how the algorithm works.

Sorry for the very late response on that one, but today was the first time I stumbled upon any info about it.

richard_mackerras · ‎2013-01-28

I am so glad to find this discussion. IBM Support spat out a list from nfsstat -d with no context around it as if it is the root of all my problems.

The man page for nfsstat says

Files Causing Misaligned I/O’s

List of filenames that are causing the most misaligned I/O’s over NFS along with their corresponding

heuristic counters. The higher the counter value, the higher the misaligned I/O requests for the

corresponding file.

http://psychology.about.com defines heuristic - A heuristic is a mental shortcut that allows people to solve problems and make judgments quickly and efficiently.

Therefore if the number is big it happens more or has a bigger impact, period. Avoid reading anything more into it.

The root of my problem probably lies elsewhere (spindles, cache, misalignment on the iSCSI luns, not NFS).

Thanks everyone, especially fletch2007 for raising the topic.

garymcleanuk · ‎2014-06-12

BUMP..

My Colleague came across the same issue today... We are aligned in the OS but the under lying NFS says we are not.. nfsstat -d reporting "Files Causing Misaligned IO's"

Anyone got to the very bottom of this??