We had some discussions about LUN misalignment these week during a PAD course and I did some research to find out solid facts.
From what I've read in the community and on other forums on the internet these are my findings, based on the stats show lun:*:* command (result is point-in-time picture) in diag mode. If I am wrong, please correct me.
If you find a value other than 0% on the lines histo.1 through histo.7 for read and write, then you have a misallignment !?
So if you have a value between 0% and 100% on histo.0 and nothing on the other histo.x entries means correct allignment !? The bigger the value for histo0 (close to 100%) the better the result.
Listing below is from a test lun, with no activity (that's why no real values are measured).
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.0:0%<- value of 100% would mean good alignment
I see in our environment that some LUN's connected to Windows servers have bad alignment. This surprises me because we have SnapDrive/SnapManager products installed and LUNs are created with snapdrive !?
I'm seeing a very similar thing. It's made me question whether or not my LUNs are truly aligned properly. When I run the lun alignment show command, most of my luns appear aligned. The big offenders in my case are all of my vmware LUNs.
I find this strange because I created all of my LUNs using the VMWare LUN type (which supposedly aligns them properly) and all of my VMFS partitions using VCenter (which supposedly aligns the VMFS parititions). I double-checked all of my VMFS partitions and they have a starting offset of 128.
THis makes me question the meaning of "misaligned" when given my the lun alignment command. I'm assuming it reports that when it sees a certain number of partial reads. Is it possible that on (heavilly) deduplicated VMWare LUNs this is normal? Also, I did not use SnapDrive to create my VMWare LUNs since they were not going to be attached to Windows hosts. All of the documents I've read say that as long as you use the "VMWare" data type, you are good-to-go.
Does anybody know where there is a more thorough explanation of the lun alignment show command? Or does anybody see anything I may have missed in my investigation of this?
In my case, I actualy found I have a handful of misaligned Windows server 2003 VMs. This may be causing the lun alignment show to give me the "misaligned" results I was seeing. I'm going to try and correct those machines and run it again. I can't speak for the other posters, but this may be my issue.
Some applications, databases in particular, will have I/Os that do not start on a 4k boundary and thus show up in one of the buckets other than align_histo.0. Generally, its a smallish amount and the .0 bucket has the highest percentage of I/Os, but sometimes not. It just depends on the application doing the writes. There are also times when the OS itself or some other app on the boot drive does a small amount of I/O that doesn't start at the beginning of a block, which is why you might see 1 or 2 percent in some buckets.
They show where on a 4k WAFL block reads or writes started. So, the .0 bucket means the I/O started (was read from or started writing at) the beginning of a block. .1 means it started ((4k / 8)*1) = 512 bytes into the block, .2 is 1024 bytes into the block, and so on.
"How misaligned" would be determined by the percentage of I/O not in the .0 bucket. However, misalignment is (with the exception of limited cases involving multiple partitions per LUN) a yes/no question, not a question of how much. Generally in a misaligned scenario you will see one or two of the buckets with 90%+ of the I/O. If all of the buckets have a significant percentage, it is more likely that you have an application with a specific write pattern (as in the case of databases I mentioned before) and not a misaligned partition.