Data Backup and Recovery
Data Backup and Recovery
Hello,
We had some discussions about LUN misalignment these week during a PAD course and I did some research to find out solid facts.
From what I've read in the community and on other forums on the internet these are my findings, based on the stats show lun:*:* command (result is point-in-time picture) in diag mode. If I am wrong, please correct me.
If you find a value other than 0% on the lines histo.1 through histo.7 for read and write, then you have a misallignment !?
So if you have a value between 0% and 100% on histo.0 and nothing on the other histo.x entries means correct allignment !? The bigger the value for histo0 (close to 100%) the better the result.
Listing below is from a test lun, with no activity (that's why no real values are measured).
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.0:0% <- value of 100% would mean good alignment
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.1:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.2:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.3:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.4:0% <- any value on one of these entries would mean bad alignment
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.5:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.6:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_align_histo.7:0% <-
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.0:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.1:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.2:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.3:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.4:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.5:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.6:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo.7:0%
Difference between showed value and 100% max, will show up as partial_read or partial_write percentage ?
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:read_partial_blocks:0%
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_partial_blocks:0%
Question: Partial_reads doesn't necessarily mean that there is a poblem, since it could be that NOT the whole 4kb block is needed on a read action ?
I think above conclusion reflects the LUN Alignment graph in Perfomance Advisor module of NetApp Management Console :
Since in the legends only histo.0 is displayed for alligned WAFL Ops, and probably all other histo's are captured in the unalligned WAFL Ops.
I also found that the command (don't know if it works on every ONTAP version) :
priv set -q advanced; lun show -v
/vol/test/qt_test/test.lun 199g (213674622976) (r/w, online, mapped)
Comment: "ibm p750"
Serial#: dgFvGobm8KhE
Share: none
Space Reservation: enabled (not honored by containing Aggregate)
Multiprotocol Type: aix
Maps: IBM_P750=0
Occupied Size: 98.8g (106063941632)
Creation Time: Wed Mar 23 09:24:08 CET 2011
Alignment: aligned
Cluster Shared Volume Information: 0x0
/vol/rmgsql02_Snapinfo/qt_rmgsql02_Snapinfo/Snapinfo.lun 35.0g (37589529600) (r/w, online, mapped)
Comment: " "
Serial#: dgFvGocLfaL6
Share: none
Space Reservation: enabled (not honored by containing Aggregate)
Multiprotocol Type: windows_gpt
Maps: viaRPC.iqn.1991-05.com.microsoft:rmgsql02.rmg.be=3
Occupied Size: 35.0g (37602512896)
Creation Time: Thu Apr 21 12:06:49 CEST 2011
Alignment: misaligned
Cluster Shared Volume Information: 0x0
Will result if a lun is aligned or misaligned !
I see in our environment that some LUN's connected to Windows servers have bad alignment. This surprises me because we have SnapDrive/SnapManager products installed and LUNs are created with snapdrive !?
Regards,
Geert
Hi Geert,
Just to answer one thing: No matter how you create a LUN it will come with NO filesystem on top of it, its RAW blocks. Only
when an OS attaches to it will a file system be put in place. Hence, to my knowledge I could be wrong of course, using
snapdrive, snapmanager will not counter this issue as the issue arises when MS attaches to the LUN.
MS default starting point on the file system doesnt divide well with 4 (WAFL is 4K blocks as you know)b hence the need to align every time you attach MS to a LUN. If you
are running in a virtual environment make sure your template is aligned correctly.
Hope this helps, its been a while since I dealt with this issue in detail.
Eric
Message was edited by: eric barlier
Hallo Eric,
Thanks for your feedback.
If you create a LUN with snapdrive on a Windows OS, the lun is automatically formated with NTFS.
You should expect that alignment goes as supposed to be then !?
Regards,
Geert
There should be no misalignment issues when SnapDrive is used to create a LUN & host is physical, not virtual:
https://kb.netapp.com/support/index?page=content&id=3011201
Regards,
Radek
I discovered another command for in diag mode:
> lun alignment show
...
/vol/Labo_rmgsql01_MSSQL_Log/qt_Labo_rmgsql01_MSSQL_Log/MSSQL_Log.lun
Multiprotocol type: windows_2008
Alignment: aligned
Write alignment histogram percentage: 63, 1, 1, 1, 0, 2, 0, 0
Read alignment histogram percentage: 47, 0, 0, 0, 0, 0, 0, 0
Partial writes percentage: 28
Partial reads percentage: 50
...
The output surprised me however and from what I see, maybe my first conclusion was wrong ?! The command says the lun is aligned, but histrogam shows another pattern then I had in mind :
"Write alignment histogram percentage: 63, 1, 1, 1, 0, 2, 0, 0" probably translates to:
write_align_histo.0 : 63%
write_align_histo.1 : 1%
write_align_histo.2 : 1%
write_align_histo.3 : 1%
write_align_histo.4 : 0%
write_align_histo.5 : 2%
write_align_histo.6 : 0%
write_align_histo.7 : 0%
write_partial_blocks : 28%
How could this be aligned then ?!
In the meantime, I've been informed that some Snapdrive versions have a bug : "Bug ID 103555, Some version of SnapDrive misalign data on LUNs",
Solution : update to/install Snapdrive 6.3 !
Hi Geert,
Just one point for clarity's sake - Microsoft has resolved the lun alignment issue with Windows 2008, it's only on 2003 where you run into misalignment issues.
Regards,
Jeff
I'm seeing a very similar thing. It's made me question whether or not my LUNs are truly aligned properly. When I run the lun alignment show command, most of my luns appear aligned. The big offenders in my case are all of my vmware LUNs.
I find this strange because I created all of my LUNs using the VMWare LUN type (which supposedly aligns them properly) and all of my VMFS partitions using VCenter (which supposedly aligns the VMFS parititions). I double-checked all of my VMFS partitions and they have a starting offset of 128.
THis makes me question the meaning of "misaligned" when given my the lun alignment command. I'm assuming it reports that when it sees a certain number of partial reads. Is it possible that on (heavilly) deduplicated VMWare LUNs this is normal? Also, I did not use SnapDrive to create my VMWare LUNs since they were not going to be attached to Windows hosts. All of the documents I've read say that as long as you use the "VMWare" data type, you are good-to-go.
Does anybody know where there is a more thorough explanation of the lun alignment show command? Or does anybody see anything I may have missed in my investigation of this?
I have also been looking into the alignment data reported and came to the same questions. E.g. for several SQL server log LUNs, created by snapdrive on Windows 2008, I found similar write histograms
Multiprotocol type: windows_2008 | |
Alignment: aligned | |
Write alignment histogram percentage: 83, 2, 1, 2, 1, 1, 2, 1 | |
Read alignment histogram percentage: 0, 0, 1, 0, 0, 0, 0, 0 | |
Partial writes percentage: 4 | |
Partial reads percentage: 96 |
Either Windows is doing something weird or Ontap is incorrect. I think small deviations are not an indication of misalignment, but some statement from NetApp would be appreciated.
Only when you see data like this you know the alignment is definitely wrong
Multiprotocol type: linux | |
Alignment: misaligned | |
Write alignment histogram percentage: 0, 0, 0, 0, 0, 0, 0, 100 | |
Read alignment histogram percentage: 0, 0, 0, 0, 0, 0, 0, 100 | |
Partial writes percentage: 0 | |
Partial reads percentage: 0 |
In my case, I actualy found I have a handful of misaligned Windows server 2003 VMs. This may be causing the lun alignment show to give me the "misaligned" results I was seeing. I'm going to try and correct those machines and run it again. I can't speak for the other posters, but this may be my issue.
Some applications, databases in particular, will have I/Os that do not start on a 4k boundary and thus show up in one of the buckets other than align_histo.0. Generally, its a smallish amount and the .0 bucket has the highest percentage of I/Os, but sometimes not. It just depends on the application doing the writes. There are also times when the OS itself or some other app on the boot drive does a small amount of I/O that doesn't start at the beginning of a block, which is why you might see 1 or 2 percent in some buckets.
In short, the general rule is, if its not a database, and the vast majority of I/O falls in the .0 bucket, then it is aligned. A couple on the other buckets are OK. If it is database, you are better off actually checking the partition offset in the OS and the OS type of the LUN. See KBs https://kb.netapp.com/support/index?page=content&actp=LIST&id=1010717 (Linux) and https://kb.netapp.com/support/index?page=content&id=1010803 (Windows) for more info. Or, open a case with support.
Cool, thanks.
What do the different value buckets actually mean?
lun:/vol/test/qt_test/test.lun-dgFvGobm8KhE:write_align_histo..0 - 8
Is each number bucket equal to 100 / 8 = 12.5%? meaning bucket 0 is 0% misaligned, bucket 1 up to 12.5% unaligned, etc... ?
4K WAFL block / 512 Bytes are represented in buckets 0-7.
They show where on a 4k WAFL block reads or writes started. So, the .0 bucket means the I/O started (was read from or started writing at) the beginning of a block. .1 means it started ((4k / 8)*1) = 512 bytes into the block, .2 is 1024 bytes into the block, and so on.
"How misaligned" would be determined by the percentage of I/O not in the .0 bucket. However, misalignment is (with the exception of limited cases involving multiple partitions per LUN) a yes/no question, not a question of how much. Generally in a misaligned scenario you will see one or two of the buckets with 90%+ of the I/O. If all of the buckets have a significant percentage, it is more likely that you have an application with a specific write pattern (as in the case of databases I mentioned before) and not a misaligned partition.
Makes sense, thanks!