VMware Solutions Discussions

Flow Control

andrew_stack
16,513 Views

This is a direct rebuttal to:

What are the Best Practices for Gigabit  flow control configuration for optimum performance on the NetApp  appliance and switch

Solution ID: kb22926

Last updated: 9 JUL 2010

In which it states:

Solution

Flow  control can be configured on interfaces operating at or above 1,000  Mbps. For proper Ethernet operation on NetApp appliances, it is highly  recommended that full (send and receive) flow control be enabled on the  NetApp appliance, switch ports and hosts.

Whomever wrote this needs to do re-evaluate this statement as it is not a trivial mater.  I have run into issues with this setting recommendation.  Namely ISCSI timeouts NFS timeouts etc with Ether Channel flow control set to ON.  It can wreck havoc in your shop.  So beware!

Netapp really needs to think this through futher and should consider frankly striking this statement from it's recommendation until it can present to it's community a more thorough understanding of it's own recommendation.

Essentially Flow Control being set to ON (both send and receive) on a switch allows a a switch to send out a 'Pause Frame" to any host that it perceives to be overloaded.  Imagine that you have a VIF with several 1 GB uplinks hosting ISCSI traffic to and from 20 ESX hosts.  They in turn may have 20 VM Hosts per ESX Server.  These VM's are typically using the 10G Virtual NIC.  If one of your uplinks becomes saturated the switch can literally send a pause statement to the filer.  The filer in turn will stop transmission of ALL traffic on that particular uplink.  This will cause all traffic on the uplink to cease communicating (not sure of the pause period but suffice to say there's a drop and it's noticeable to say the least).  This situation is called head-of-line blocking and it is the major reason why Ethernet Flow Control is somewhat dangerouse to use.

I base my statement both on personal experience and research.  For those that want to investigate this claim further I encourage that the following be reviewed in order to get a better grasp of the subject:

http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

http://www.networkworld.com/netresources/0913flow2.html

The intent is not to stir controversy but to rather get a better grasp of the subject matter at hand and open the topic up to further discussion.  Feel free to correct me if I'm wrong.  My basic statement however remains the same.  Flow Control needs much more research before the above recommendation can be put forth and for the time being I would weigh using it with caution.

1 ACCEPTED SOLUTION

vmsjaak13
16,514 Views

TR-3749: NetApp and VMware vSphere Storage Best Practices recommends Flow Control to be set to 'send' on NetApp and VMware hosts, and 'receive' on the switch.

Also see the excellent explanation given by klem here:

http://communities.netapp.com/message/37173#37173

Regards,

Niek

View solution in original post

5 REPLIES 5

vmsjaak13
16,515 Views

TR-3749: NetApp and VMware vSphere Storage Best Practices recommends Flow Control to be set to 'send' on NetApp and VMware hosts, and 'receive' on the switch.

Also see the excellent explanation given by klem here:

http://communities.netapp.com/message/37173#37173

Regards,

Niek

andrew_stack
16,513 Views

Sure, for ESX that may apply.  But the same problem exists.  You have multiple vendors with multiple suggestions.  As far as ESX goes that may be ideal.  But the filer may also be handling SQL, Oracle, Exchange, etc.  Netapp has to understand this and therefore needs to take the whole picture into consideration.

After having looked it over a bit, the conclusion I came to is to forego it altogether.  I'm not sure that's the best setting but it certainly has helped my environment from the FULL flow control setting we had earlier implemented per KB 22926.

Comments welcome.

madden
16,513 Views

Hi,

Section 6 of TR-3802 (http://www.netapp.com/us/media/tr-3802.pdf) has a universal recommendation to set flowcontrol to none throughout the network.  So SQL, Exchange, etc, doesn't matter, just set it to none and let higher layers manage congestion more gracefully.

Regards,

Chris

bbjholcomb
16,306 Views

It appears to be a little more confusiing for cDOT flow control, one KB/Best Practice tells us to turn it off for clustered ports only not the data ports (NFS/CIFS/ISCSI) go over. The only way I found to see if the change will have an impact is to look at the ifstat -a output, in 7mode the field is pause frame, in cDOT it's Xoff. In our case we see no receive pause frames, we do transmit pase frames, only a few for the clustered ports (1 for every 141M frames) for data ports (NFS/CIFS/ISCSI) (1 for 5.5M frames).

clayton123
11,745 Views

Although this is somewhat of an old thread, it's still very relevant. I believe maddens comment should be marked as the correct answer.

Public