Network and Storage Protocols

FAS2554 Slow iSCSI performance

didi
3,144 Views

Hi team,

 

I have ESXi 6.7u3 host with dedicated 4 x 10GbE NICs for Netapp traffic, using software iSCSI adapter, connected to 4 x 10GbE FAS2554 2 nodes cluster. Netapp Release 9.8P20,  RAID-DP with 11 x SAS HDD. Cluster has iSCSI and NFS VMs with 1 volume and a LUN for iSCSI and 1 volume for NFS.

 

I have a Windows 10 VM with connected 1 x .vmdk file on iSCSI LUN and 1 x .vmdk file on NFS volume. Both files formatted with NTFS. My source is also 10GbE storage.

When I copy file from my source to iSCSI LUN .vmdk, I have average 40-65MB/s write speed.

When I copy file from my source to NFS located .vmdk, I have average 200-250MB/s write speed.

 

What I found from Netapp cluster console STATS SHOW command is:

volume:Datastore2_vol:total_protocol_write_histo.< 4KB:4

volume:Datastore2_vol:total_protocol_write_histo.= 4KB:118

volume:Datastore2_vol:total_protocol_write_histo.< 8KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 8KB:78

volume:Datastore2_vol:total_protocol_write_histo.< 16KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 16KB:0

volume:Datastore2_vol:total_protocol_write_histo.< 32KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 32KB:0

volume:Datastore2_vol:total_protocol_write_histo.< 64KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 64KB:2888

volume:Datastore2_vol:total_protocol_write_histo.< 256KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 256KB:0

volume:Datastore2_vol:total_protocol_write_histo.< 1024KB:0

volume:Datastore2_vol:total_protocol_write_histo.= 1024KB:0

volume:Datastore2_vol:total_protocol_write_histo.> 1024KB:0

volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 4KB:0

volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 8KB:638976

volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 16KB:0

volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned 32KB:0

volume:Datastore2_vol:total_protocol_misaligned_writes.misaligned REST:0

----

volume:Datastore2_vol:iscsi_protocol_write_histo.< 4KB:4

volume:Datastore2_vol:iscsi_protocol_write_histo.= 4KB:118

volume:Datastore2_vol:iscsi_protocol_write_histo.< 8KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 8KB:78

volume:Datastore2_vol:iscsi_protocol_write_histo.< 16KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 16KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.< 32KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 32KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.< 64KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 64KB:2888

volume:Datastore2_vol:iscsi_protocol_write_histo.< 256KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 256KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.< 1024KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.= 1024KB:0

volume:Datastore2_vol:iscsi_protocol_write_histo.> 1024KB:0

volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 4KB:0

volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 8KB:638976

volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 16KB:0

volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned 32KB:0

volume:Datastore2_vol:iscsi_protocol_misaligned_writes.misaligned REST:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<2us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<6us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<10us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<14us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<20us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<40us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<60us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<80us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<100us:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<200us:5

volume:Datastore2_vol:iscsi_protocol_write_latency.<400us:128

volume:Datastore2_vol:iscsi_protocol_write_latency.<600us:515

volume:Datastore2_vol:iscsi_protocol_write_latency.<800us:590

volume:Datastore2_vol:iscsi_protocol_write_latency.<1ms:285

volume:Datastore2_vol:iscsi_protocol_write_latency.<2ms:839

volume:Datastore2_vol:iscsi_protocol_write_latency.<4ms:443

volume:Datastore2_vol:iscsi_protocol_write_latency.<6ms:128

volume:Datastore2_vol:iscsi_protocol_write_latency.<8ms:97

volume:Datastore2_vol:iscsi_protocol_write_latency.<10ms:48

volume:Datastore2_vol:iscsi_protocol_write_latency.<12ms:2

volume:Datastore2_vol:iscsi_protocol_write_latency.<14ms:2

volume:Datastore2_vol:iscsi_protocol_write_latency.<16ms:0

volume:Datastore2_vol:iscsi_protocol_write_latency.<18ms:1

volume:Datastore2_vol:iscsi_protocol_write_latency.<20ms:1

volume:Datastore2_vol:iscsi_protocol_write_latency.<40ms:4

---

lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_latency:5527.17us

lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:total_ops:1018/s

lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_read_latency:206.49us

lun:6fb03ed4-d42a-45b7-8a42-dae58e6823c0:avg_write_latency:5401.01us

 

The same low writing\reading speed I have if I decide to migrate VM files from or into iSCSI LUN.

I spent several nights investigating and it all comes to NETAPP or ESXi iSCSI implementation (but switching to iSCSI HBA  NIC does not change anything), so I believe it's Netapp fault.

I checked Initiator group settings, it's LINUX. VMFS partition alignment is ok (starting sector 2048, but it's weird, according to ONTAP manual it should be 0), So I cant think of anything besides misalignment.

 

Please help me to find out the reason it's slow with iSCSI (╯°□°)╯

6 REPLIES 6

elementx
3,103 Views

> it all comes to NETAPP or ESXi iSCSI implementation (but switching to iSCSI HBA NIC does not change anything), so I believe it's Netapp fault.

 

Interesting.

 

> I checked Initiator group settings, it's LINUX. VMFS partition alignment is ok (starting sector 2048, but it's weird, according to ONTAP manual it should be 0), 

 

Partition alignment recommendations for various OS can be found here.

https://kb.netapp.com/onprem/ontap/da/SAN/How_to_identify_unaligned_IO_on_LUNs

 

For ESXi, I'd try 0 as suggested.

 

For Windows partitions on ESXi VMDKs, I'd try alignment recommendations for Windows.

 

Using ESXi CLI partitioning tool (although you probably wouldn't need it to start at 0):

https://kb.vmware.com/s/article/1036609

 

As always, don't try new stuff on production LUNs or datastores!

 

didi
3,054 Views

Good morning Sir,

 

Yep, I read those recommendations several times. According to netapp manual when we select Initiator group\volume type = Linux or VMWARE it should internally align its volume for further host partition create procedure. So, we jump on the host and tell to create VMFS6 partition in our newly discovered LUN. Partition created with start sector offset = 2048, which is not 0, but absolutely ok [2048 sectors x 512B = 1048576B and we can divide it perfectly into WAFL 4096 chunks] So, there should be no misalignment.

I also checked ActiveIQUnifiedManager vm I have, it does not complain about anything. It sees partialwrites on volume, but that is considered ok.

 

I even recreated my volume\LUN from the scratch several times on two different aggregates on this netapp, nothing changed.

 

My question - why I/O speed is awful comparing to NFS, what is wrong with it?

 

ps.

As always, don't try new stuff on production LUNs or datastores!

 ibcuy6sw0o131

elementx
3,023 Views

Did Windows select automatic partitioning for the volume and is the Windows partition GPT?

 

I don't have a place to try so I hope somebody who has a similar environment will be able to check or speak from experience.

 

didi
3,004 Views

Correct, all VMs are windows 2016 srv with their native gpt and starting sector offset aligned with full 4096KB block.

didi
2,945 Views

Does anyone know what "WAFL replayed messages" are?

When I collect node statistics with STATIT command I see "147015902374658177.82 replayed messages" under `WAFL Statistics (per second)` section. And this happens in both reading and writing cases.

elementx
2,864 Views

Could be a hardware problem. I haven't seen this before, but I don't gather those logs often.

Maybe contact Support?

Public