High iowait in Redhat VM

cjeff · ‎2010-10-13

IHAC wants to deploy Domino on Redhat VM, and mount the Domino data from NFS volume.

They use the loadrunner for the testing, try to add the users to 1200.

During the testing, we find the iowait on the Redhat VM(the wa column from vmstat on Redhat) becomes high, it is about 50%~70%.

From the netstat -s result, we find such info,


524298 packets collapsed in receive queue due to low socket buffer

Though we have made below change, the result is same.
net.ipv4.tcp_rmem = 8192 873800 8738000
net.ipv4.tcp_wmem = 4096 655360 6553600
net.core.rmem_max = 8738000
net.core.wmem_max = 6553600

The storage system seems ok, the nfs read latency is less than 9ms, write latency is less than 1ms.

How can we identify the problem, is it caused by too many client connections?

Thanks,
Jeff Cai

spinks · ‎2010-10-13

Jeff,

Would you mind sharing the server specifics? What is the processor / RAM?

What version of Domino is being used?

What Storage System is being used?

NetApp, IBM, and VMware is in the process of publishing a whitepaper on Virtualizing Lotus Domino 8.5.1 on VMware vSphere 4.

This paper is in draft and should be published by the end of the year.

We did the testing on a FAS3020 under a Windows VM with 1vCPU - 4vCPU and 16GB RAM with tests scaling up to 4,000 users

The sweet spot in terms of performance was 2,000 users on 2vCPU and 8GB RAM.

Unfortunalty I do not have iowait stats.

Feel free to contact me via email and I can get in touch with the teams that I worked with during this testing to try and determine where the issue lies.

Thanks,

John

bikash · ‎2010-10-13

Have you set the right NFS tuning parameter using VSC2.0? Also have you set the Linux IO timeout values from the tools in VSC2.0? Also is the RH install NUMA aware. If it is then have you taken care to setup the memory appropriately - i.e. if the ESX host has 2 sockets and there is 8 GB of memory that has assigned 4GB local to each socket as per NUMA configuration. In the VM setup if you have chosen 6GB of memory, then it will pick 4GB from one NUMA node and 2GB from the other and thereby cause more overhead because the cache will keep switching between the two NUMA nodes.