We are trying to deploy VM on NFS but it is giving us slow network performance while transferring the data from one nfs datastore to another nfs datastore hosted on 2 different filers, mainly while transferring multiple vmdk files simultaneously. Both the filers and esx servers are on same network and data transfer rate is around 7 mbps.
I have tried doing some tests and noticed if we transfer one VMDK at a time between the filers the performance is around 22 mbps whereas if we do the data transfer between same pair of filers and with same volumes using ndmpcopy and performance goes upto 130 mbps. Even when migrating the vmdks from DMX to Netapp we are getting only upto 40 mbps.
We are using all gbic connection with ESX version 3 in our environment.
Please let me know if you need some more information.
I have seen this issue caused by a NIC teaming policy in the vSwitches in the ESX/ESXi servers (route based on IP hash) combined with not having the physical switches properly stacked so when EtherChannel comes into place it will intermitently fail thus leading to performance issues though you won't notice the failure as the ESXi servers will keep sending traffic through the NIC that gets it do the filer and the other one will be tried and silently
If your network is properly set up (switches are configured for EtherChannel link aggregation) then ensure your ESX/ESXi servers too, including the management network: http://kb.vmware.com/kb/1022751
The ESX/ESXi TCP/IP stack in the vmkernel is one of the best implementations in the industry and I don't think the handling of NFS traffic is the problem at all, it must be a soft/hard configuration issue. Actually everyday I see more serious implementations replacing FC by NFS in upgrades and the performance is great, especially when you leverage EtherChannel.
I have seen this happen many times for different installation’s (different filer models and even different physical servers) I would be surprised if you find any issue with the networking. I speaking with VWARE consultants (I am a NetApp GUY) the feeling is that the vmkernel performs these transfers at a very low priority (hence the speed).
This is further backed up by the fact that if you use one of the flavours of ESX with a console and mount the NFS to the ESX machine from the Linux underpinnings you will see a much higher speed (as this will be a cp by the user not controlled by vmkernel)
I would have to agree here. Another issue is that throughput will increase dramatically with a multithreaded application such as rsync or RichCopy. So if you were able to rsync the two mounts from the console of ESX you would likely see excellent throughput since the underlying nfs stack on ESX is highly optimized and rsync is multi threaded.
We had a similar problem, not with VMware but with Oracle databases. Our filer has dual 10G nics that are trunked but we were getting deplorable throughput. Working with NetApp techsupport, we discovered if we set flowcontrol to none, we got the desired results.
We need to know so much more about the environment. It could be the networking but there are many other possibilities. I would recommend opening a support case with NetApp support to have them take a look as this sort of trouble shooting is likely out of scope for this forum. For example, is the controller busy? The disk? How is the VLAN routed? what is the switch fabric like? How did you connect the datastores to the ESX servers? IP address or Host name?
Hi Lovik, You might want to check the network section in TR-3428 and verify you have followed the best practices for your VMKernel network configuration. The next step would be to open a support case. Having said that, you might consider taking a look at the Rapid Cloning Utility 2.0 for virtual machine deployment. While this wont directly address the issue you mentioned here, it will dramatically decrease the amount of time, resource, and capacity required to provision virtual machines on the same controller.
Thanks for your reply, however we are using the best practices as they are already going through vifs in a separate vlan, to add more if we use the recommended window size 64240 (tr-3705) the performance is low so have done the changes on filer and using the ontap default window size 65940 as we see ESX is using 65535.