We have software build host running CentOS 6.7 with 24 CPU cores and there are daily regression jobs running with 24 threads. the workload is mixed reads/writes. Their data were on an old FAS3140 filer (all SAS disks) and we've copied to this new FAS8080EX running ONTAP 9.1RC1 and re-NFS mount to the new storage. The software team notices the 24-thread regression jobs run much slower now. I don't see any obvious bottleneck at the network, storage, filer CPU resources level and nothing changed on the build host other than mount point changes. One observation is if the regression job runs with 17 or less threads, it runs as expected but once beyond 17 threads, it's considerably slower than the old storage and some processes on the build host have I/O wait state. Involved nodes' systat do not show anything unusual. Are there any limits set such as maximum NFS connections per host or other limits that may help contribute this issue? Any other suggestions on troubleshoot this issue?
Good to hear from you again! There are definitely tunables to adjust how the linux client interacts with the storage.
Disclaimer: I would suggest opening a performance ticket so they can investigate the host / controller performance before randomly tuning your production build environment if its criticality is very high. =D
But here are two great NFS tuning documents to read to make the conversation so much more fun!