Additional Virtualization Discussions

iSCSI test tool

BrendonHiggins
5,891 Views

I have now worked on two different systems which have had poor performance when using iSCSI.  In both cases the issue was found to be the LAN.  Do you know of {recomend} any testing tools for iSCSI?

I currently use

  • Error logs on VM, ESX, Switch, NetApp filer
  • Statit, sysstat, lun stat, etc on filer
  • GSXTOP on esx box
  • PKTT on filer with wire shark
  • SQLIO.exe
  • Luck

Seams like I have to prove what is not the cause {lun alignment, etc} before saying yes it is the LAN.

It would be nice to have a simple tool that generates traffic between the filer and a host {Linux boot from CD} that could be used to test a 'known good' config with a traffic stream suitable for benchmarking ~ ie always the same.

Brendon

1 ACCEPTED SOLUTION

treyl
5,891 Views

Brendon

The tools you are using to troubleshoot are excellent.  This is one of those areas where convergence has produced a scenario where expertise is needed in two areas of technology.   Several years prior to me joining NetApp I designed and built IP Telephony networks.  There was a convergence of voice being transported on data networks and a need for a mixed skillset of a individual who knew how to troubleshoot voice and data.  You needed to understand how both worked when combined, from that experience would evolve a new set of skills to manage and troubleshoot those new infrastructures.   iSCSI is a similar animal.  It has been around for sometime but it requires two skillsets, one part data networking and one part storage. 

In my IP Telephony days,  we had a few environments where phones would just be plugged into the same network (and ultimately VLAN) as the users were in.  A user would fire off a large print job or do something to generate broadcast traffic that caused choppiness or snap crackle pop to be heard on the line.  In voice networks when you hear the snap crackle pop you are hearing packets being lost or discarded.   We pushed very hard for archiectures to evolve such that IP Telephony endpoints were dedicated to phone vlans and call processing equipment was dedicated to its own VLANs.   We succeded in that push as architectures and technologies were designed to ensure the reliable transport of voice. 

There are several efforts ongoing in NetApp to continue to evolve and adapt our best practices with the construction of networks to transport all ethernet based storage protocols.   There is an excellent partnership between Cisco and NetApp which is begining to see a new generation of best practices be published for general storage networking.    Stay tuned for those documents.   Specific to Virtualization there is a great effort to communicate networking best practices for VMware and Citrix, as those technologies relate to enabling ethernet storage protocols.  

Citrix XenServer and NetApp Storage Best Practices - Jointly written with NetApp and Citrix

NetApp and VMWare Virtual Infrastructure 3 Storage Best Practices - Written by NetApp and reviewed by VMware prior to release.

The above documents have began to address very specific network configuration best practices.  TR-3428 is an excellent example of this and I know that the upcoming revision to TR-3428 has even more networking configuration best practices.

Back to your origional question with regards to the problems you had experienced.   In your scenarios it sounds as though you connected the storage array to the network and then hosts.  There was degraded performance and through a very good suite of toolsets you identified the problem to be network.  In that description it sounds as though you were offered ports to connect the host and controllers, without much involvement in the configuration of the network to support the iSCSI fabric.  Through your troubleshooting efforts you identified some problems that resulted in changes being made to the LAN to resolve the issue. My suggestion in those scenarios is work to have an active involvement in undersatnding the configuration of the network you are going to be transported on.   That is your storage fabric, it happens to be on a general purpose ethernet fabric but you must influence it's contstruction so that storage traffic is transported optimally.   A first suggestion is to request isolation from other devices.   Continue to use the troubleshooting tools that you have assembled.  I think your suggestion of a tool to test or generate load is a great one and I will take that back to our folks internally and see if that is something we could work on.

In the interim, feel free to lean on NetApp support and experts in the field to provide you that air cover you need to assist with networking configurations were a network is producing performance results below your expectations.   NetApp has a deep support organization with very solid networking experience throughout.

Hope this helps and thank you for the suggestion,

Trey

View solution in original post

1 REPLY 1

treyl
5,892 Views

Brendon

The tools you are using to troubleshoot are excellent.  This is one of those areas where convergence has produced a scenario where expertise is needed in two areas of technology.   Several years prior to me joining NetApp I designed and built IP Telephony networks.  There was a convergence of voice being transported on data networks and a need for a mixed skillset of a individual who knew how to troubleshoot voice and data.  You needed to understand how both worked when combined, from that experience would evolve a new set of skills to manage and troubleshoot those new infrastructures.   iSCSI is a similar animal.  It has been around for sometime but it requires two skillsets, one part data networking and one part storage. 

In my IP Telephony days,  we had a few environments where phones would just be plugged into the same network (and ultimately VLAN) as the users were in.  A user would fire off a large print job or do something to generate broadcast traffic that caused choppiness or snap crackle pop to be heard on the line.  In voice networks when you hear the snap crackle pop you are hearing packets being lost or discarded.   We pushed very hard for archiectures to evolve such that IP Telephony endpoints were dedicated to phone vlans and call processing equipment was dedicated to its own VLANs.   We succeded in that push as architectures and technologies were designed to ensure the reliable transport of voice. 

There are several efforts ongoing in NetApp to continue to evolve and adapt our best practices with the construction of networks to transport all ethernet based storage protocols.   There is an excellent partnership between Cisco and NetApp which is begining to see a new generation of best practices be published for general storage networking.    Stay tuned for those documents.   Specific to Virtualization there is a great effort to communicate networking best practices for VMware and Citrix, as those technologies relate to enabling ethernet storage protocols.  

Citrix XenServer and NetApp Storage Best Practices - Jointly written with NetApp and Citrix

NetApp and VMWare Virtual Infrastructure 3 Storage Best Practices - Written by NetApp and reviewed by VMware prior to release.

The above documents have began to address very specific network configuration best practices.  TR-3428 is an excellent example of this and I know that the upcoming revision to TR-3428 has even more networking configuration best practices.

Back to your origional question with regards to the problems you had experienced.   In your scenarios it sounds as though you connected the storage array to the network and then hosts.  There was degraded performance and through a very good suite of toolsets you identified the problem to be network.  In that description it sounds as though you were offered ports to connect the host and controllers, without much involvement in the configuration of the network to support the iSCSI fabric.  Through your troubleshooting efforts you identified some problems that resulted in changes being made to the LAN to resolve the issue. My suggestion in those scenarios is work to have an active involvement in undersatnding the configuration of the network you are going to be transported on.   That is your storage fabric, it happens to be on a general purpose ethernet fabric but you must influence it's contstruction so that storage traffic is transported optimally.   A first suggestion is to request isolation from other devices.   Continue to use the troubleshooting tools that you have assembled.  I think your suggestion of a tool to test or generate load is a great one and I will take that back to our folks internally and see if that is something we could work on.

In the interim, feel free to lean on NetApp support and experts in the field to provide you that air cover you need to assist with networking configurations were a network is producing performance results below your expectations.   NetApp has a deep support organization with very solid networking experience throughout.

Hope this helps and thank you for the suggestion,

Trey

Public