What are the Best Practices for Gigabit flow control configuration for optimum performance on the NetApp appliance and switch
Solution ID: kb22926
Last updated: 9 JUL 2010
In which it states:
Flow control can be configured on interfaces operating at or above 1,000 Mbps. For proper Ethernet operation on NetApp appliances, it is highly recommended that full (send and receive) flow control be enabled on the NetApp appliance, switch ports and hosts.
Whomever wrote this needs to do re-evaluate this statement as it is not a trivial mater. I have run into issues with this setting recommendation. Namely ISCSI timeouts NFS timeouts etc with Ether Channel flow control set to ON. It can wreck havoc in your shop. So beware!
Netapp really needs to think this through futher and should consider frankly striking this statement from it's recommendation until it can present to it's community a more thorough understanding of it's own recommendation.
Essentially Flow Control being set to ON (both send and receive) on a switch allows a a switch to send out a 'Pause Frame" to any host that it perceives to be overloaded. Imagine that you have a VIF with several 1 GB uplinks hosting ISCSI traffic to and from 20 ESX hosts. They in turn may have 20 VM Hosts per ESX Server. These VM's are typically using the 10G Virtual NIC. If one of your uplinks becomes saturated the switch can literally send a pause statement to the filer. The filer in turn will stop transmission of ALL traffic on that particular uplink. This will cause all traffic on the uplink to cease communicating (not sure of the pause period but suffice to say there's a drop and it's noticeable to say the least). This situation is called head-of-line blocking and it is the major reason why Ethernet Flow Control is somewhat dangerouse to use.
I base my statement both on personal experience and research. For those that want to investigate this claim further I encourage that the following be reviewed in order to get a better grasp of the subject:
The intent is not to stir controversy but to rather get a better grasp of the subject matter at hand and open the topic up to further discussion. Feel free to correct me if I'm wrong. My basic statement however remains the same. Flow Control needs much more research before the above recommendation can be put forth and for the time being I would weigh using it with caution.
Sure, for ESX that may apply. But the same problem exists. You have multiple vendors with multiple suggestions. As far as ESX goes that may be ideal. But the filer may also be handling SQL, Oracle, Exchange, etc. Netapp has to understand this and therefore needs to take the whole picture into consideration.
After having looked it over a bit, the conclusion I came to is to forego it altogether. I'm not sure that's the best setting but it certainly has helped my environment from the FULL flow control setting we had earlier implemented per KB 22926.
Section 6 of TR-3802 (http://www.netapp.com/us/media/tr-3802.pdf) has a universal recommendation to set flowcontrol to none throughout the network. So SQL, Exchange, etc, doesn't matter, just set it to none and let higher layers manage congestion more gracefully.
It appears to be a little more confusiing for cDOT flow control, one KB/Best Practice tells us to turn it off for clustered ports only not the data ports (NFS/CIFS/ISCSI) go over. The only way I found to see if the change will have an impact is to look at the ifstat -a output, in 7mode the field is pause frame, in cDOT it's Xoff. In our case we see no receive pause frames, we do transmit pase frames, only a few for the clustered ports (1 for every 141M frames) for data ports (NFS/CIFS/ISCSI) (1 for 5.5M frames).