There could be number of factors that may be 'responsible' for the slow bandwidth performance for example. However, the very first thing I would like you to check is - flow-control (Just in case you haven't considered it yet).
flow-control: what's the flow-control settings end-2-end in your environment ?
Over the years, NetApp's recommendations for flow-control has evolved and at present NetApp only recommends disabling flow-control for 'cluster-ports' (which is bydefault disabled) and rest of the Ports such as Mgmt & data should be in line with the rest of the settings in your network.
My advise would be to check: 1) What is the flow-control settings on ESXi host interface? 2) What is the flow-control settings on SWITCH? 3) What is the flow-control settings on NetApp?
For identifying the flow-control settings on the NetApp, please run this command:
First: Identify the physical ports part of the vlan-igrp serving ESXi: ::> network port ifgrp show -node node-xx
Next: Identify the flow-control on the Physical ports bonded to vlan-ifgrp: (Not their settings) ::> network port show -fields flowcontrol-admin,flowcontrol-oper -node node-xx
Please note: flow-control only applies to 'physical ports', they are not applicable to interface group (ifgrp) or VLAN therefore you don't have to note their values.
Once you know the current 'flow-control' settings on NetApp side (Physical Ports), ensure it is same end-2-end. For example: If it is disabled (set to none) then disable flow-control on SWITCH and on Host side as well.
According to the newer studies by Network evangelist and use-case recommendations: TCP is the 'real' end-to-end Flow Control mechanism and TCP is more granular/scalable and handles it better higher-up the stack (Instead of pacing data for entire port, it is better it is handled up the stack by 'tcp'). Therefore, the recommendation would be to : "disable" flow-control "end-2-end" (i.e on NetApp, Switch & Host).
Give this a try and see if it makes any difference, if it's already done (flow-control disabled) and you are still experiencing slow-ness, plz log a ticket with NetApp.
Thanks for the update. We will have to do further investigations here.
In theory: 10 Gig = should expect around 1.25 GB/sec
You are getting: 5.6 Gbps = 700 MB/sec, which is around 57 % of the theoretical value.
May I ask these questions: As I understand, you are only able to achieve 5.6Gbps ? Is the Host pushing enough data for it saturate the Pipe ?, just trying to understand the issue so that we can isolate the cause further.
Q1) Is the ifgrp (10g ports) dedicated for NFS alone, or there are other Protocols/services riding on it? Q2) What is the MTU set for Storage/Client/Data-Switches?
Could you give us the output of the following: 1) ::> ifgrp show -fields node,ifgrp,ports,mode 2) ::> vlan show -port <igrp_port> 3) ::> node run -node <whichever_node_the_ports_exists) >sysconfig -a
Also, could you try: 1) Carve out a 500GB NFS volume and mount it on Linux machine on the same ifgrp-vlan port 2) Go to /mnt/nfs-volume 3) Dump some big chunks of data around 100G and check the Network throughput ? 4) During copy hows the CPU utilization on your NetAp node?