Network and Storage Protocols
Network and Storage Protocols
Hi team,
I have ESXi 6.7u3 host with dedicated 4 x 10GbE NICs for Netapp traffic, using software iSCSI adapter, connected to 4 x 10GbE FAS2554 2 nodes cluster. Netapp Release 9.8P20, RAID-DP with 11 x SAS HDD. Cluster has iSCSI and NFS VMs with 1 volume and a LUN for iSCSI and 1 volume for NFS.
I have a Windows 10 VM with connected 1 x .vmdk file on iSCSI LUN and 1 x .vmdk file on NFS volume. Both files formatted with NTFS. My source is also 10GbE storage.
When I copy file from my source to iSCSI LUN .vmdk, I have average 40-65MB/s write speed.
When I copy file from my source to NFS located .vmdk, I have average 200-250MB/s write speed.
What I found from Netapp cluster console STATS SHOW command is:
volume:Datastore2_vol:total_protocol_write_histo.< 4KB:4
volume:Datastore2_vol:total_protocol_write_histo.= 4KB:118
volume:Datastore2_vol:total_protocol_write_histo.< 8KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 8KB:78
volume:Datastore2_vol:total_protocol_write_histo.< 16KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 16KB:0
volume:Datastore2_vol:total_protocol_write_histo.< 32KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 32KB:0
volume:Datastore2_vol:total_protocol_write_histo.< 64KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 64KB:2888
volume:Datastore2_vol:total_protocol_write_histo.< 256KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 256KB:0
volume:Datastore2_vol:total_protocol_write_histo.< 1024KB:0
volume:Datastore2_vol:total_protocol_write_histo.= 1024KB:0
volume:Datastore2_vol:total_protocol_write_histo.> 1024KB:0
volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 4KB:0
volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 8KB:638976
volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 16KB:0
volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 32KB:0
volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned REST:0
----
volume:Datastore2_vol:iscsi_protocol_write_histo.< 4KB:4
volume:Datastore2_vol:iscsi_protocol_write_histo.= 4KB:118
volume:Datastore2_vol:iscsi_protocol_write_histo.< 8KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 8KB:78
volume:Datastore2_vol:iscsi_protocol_write_histo.< 16KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 16KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.< 32KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 32KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.< 64KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 64KB:2888
volume:Datastore2_vol:iscsi_protocol_write_histo.< 256KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 256KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.< 1024KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.= 1024KB:0
volume:Datastore2_vol:iscsi_protocol_write_histo.> 1024KB:0
volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 4KB:0
volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 8KB:638976
volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 16KB:0
volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 32KB:0
volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned REST:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<2us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<6us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<10us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<14us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<20us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<40us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<60us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<80us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<100us:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<200us:5
volume:Datastore2_vol:iscsi_protocol_write_latency.<400us:128
volume:Datastore2_vol:iscsi_protocol_write_latency.<600us:515
volume:Datastore2_vol:iscsi_protocol_write_latency.<800us:590
volume:Datastore2_vol:iscsi_protocol_write_latency.<1ms:285
volume:Datastore2_vol:iscsi_protocol_write_latency.<2ms:839
volume:Datastore2_vol:iscsi_protocol_write_latency.<4ms:443
volume:Datastore2_vol:iscsi_protocol_write_latency.<6ms:128
volume:Datastore2_vol:iscsi_protocol_write_latency.<8ms:97
volume:Datastore2_vol:iscsi_protocol_write_latency.<10ms:48
volume:Datastore2_vol:iscsi_protocol_write_latency.<12ms:2
volume:Datastore2_vol:iscsi_protocol_write_latency.<14ms:2
volume:Datastore2_vol:iscsi_protocol_write_latency.<16ms:0
volume:Datastore2_vol:iscsi_protocol_write_latency.<18ms:1
volume:Datastore2_vol:iscsi_protocol_write_latency.<20ms:1
volume:Datastore2_vol:iscsi_protocol_write_latency.<40ms:4
---
lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_latency:5527.17us
lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:total_ops:1018/s
lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_read_latency:206.49us
lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_write_latency:5401.01us
The same low writing\reading speed I have if I decide to migrate VM files from or into iSCSI LUN.
I spent several nights investigating and it all comes to NETAPP or ESXi iSCSI implementation (but switching to iSCSI HBA NIC does not change anything), so I believe it's Netapp fault.
I checked Initiator group settings, it's LINUX. VMFS partition alignment is ok (starting sector 2048, but it's weird, according to ONTAP manual it should be 0), So I cant think of anything besides misalignment.
Please help me to find out the reason it's slow with iSCSI (╯°□°)╯
> it all comes to NETAPP or ESXi iSCSI implementation (but switching to iSCSI HBA NIC does not change anything), so I believe it's Netapp fault.
Interesting.
> I checked Initiator group settings, it's LINUX. VMFS partition alignment is ok (starting sector 2048, but it's weird, according to ONTAP manual it should be 0),
Partition alignment recommendations for various OS can be found here.
https://kb.netapp.com/onprem/ontap/da/SAN/How_to_identify_unaligned_IO_on_LUNs
For ESXi, I'd try 0 as suggested.
For Windows partitions on ESXi VMDKs, I'd try alignment recommendations for Windows.
Using ESXi CLI partitioning tool (although you probably wouldn't need it to start at 0):
https://kb.vmware.com/s/article/1036609
As always, don't try new stuff on production LUNs or datastores!
Good morning Sir,
Yep, I read those recommendations several times. According to netapp manual when we select Initiator group\volume type = Linux or VMWARE it should internally align its volume for further host partition create procedure. So, we jump on the host and tell to create VMFS6 partition in our newly discovered LUN. Partition created with start sector offset = 2048, which is not 0, but absolutely ok [2048 sectors x 512B = 1048576B and we can divide it perfectly into WAFL 4096 chunks] So, there should be no misalignment.
I also checked ActiveIQUnifiedManager vm I have, it does not complain about anything. It sees partialwrites on volume, but that is considered ok.
I even recreated my volume\LUN from the scratch several times on two different aggregates on this netapp, nothing changed.
My question - why I/O speed is awful comparing to NFS, what is wrong with it?
ps.
As always, don't try new stuff on production LUNs or datastores!
Did Windows select automatic partitioning for the volume and is the Windows partition GPT?
I don't have a place to try so I hope somebody who has a similar environment will be able to check or speak from experience.
Correct, all VMs are windows 2016 srv with their native gpt and starting sector offset aligned with full 4096KB block.
Does anyone know what "WAFL replayed messages" are?
When I collect node statistics with STATIT command I see "147015902374658177.82 replayed messages" under `WAFL Statistics (per second)` section. And this happens in both reading and writing cases.
Could be a hardware problem. I haven't seen this before, but I don't gather those logs often.
Maybe contact Support?