ONTAP Discussions

Performance Issues over NFS

tvaillancourt
8,477 Views

Hi Everyone,

My first post. But need some advice:

We have a FAS2020

We have 3 frontend (content/web - running CentOS 5.2) servers that use our NetApp over a 1000mbps internal Cisco VLAN. We noticed today that if we move a static HTML file (no parsing/PHP/perl/SSI - basically just text) from our NFS to the 3 local SATA RAID arrays on each of the frontend servers, we see the page load go from 1.3 - 1.5 seconds to 0.2 - 0.4 seconds.

Things we have tired with no changes to performance (ie. it is still dramaticaly slower then direct to disk) include:

1) increasing our r/wsize to 32768 up from our old value of 16384

2) disabling/enabling NFS client caching

3) Double and triple checking our network connections/speed. No packets are getting dropped or lost on either the Filer or the Client side

Each of our content servers run CentOS 5.2 using NFS3. Here is how the partition is mounted:

technical -rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp 10.0.0.10:/technical

(we use internal IP addresses for our filer and mounts)

Our NetApp is running at about 6% CPU and has lots of memory and isn't even working hard.

If anyone has any ideas/suggestions what could be causing such a dramatic difference in speed and performance - we would be all ears.

Thanks

TV

4 REPLIES 4

rmharwood
8,372 Views

Hi Tim,

I don't think I have any groundbreaking advice to offer you.

I certainly don't see anything unusual with your mount options.

I'm not certain that you are going to see faster performance from an NFS server than from local disk, especially with smaller files. How well does the NFS backend operate with, say, a large binary file?

Are the NFS and the HTTP/S traffic going over different NICs? Again, I would expect this to be significant with small files on a system that isn't loaded but it depends if you're concerned with performance on a small or large scale.

You could put a CentOS NFS server on your network and use that in comparison with the NetApp's performance.

Richard

jamesphanson
8,372 Views

First post,

I am also having NFS performance problems and I need some help.

We have a 3040 running DOT 8.x. with a 10G ethernet connection to a Dell R900 running an Oracle database.

I have tested throughput with dd if=/dev/zero of=./1Gb.file bs=8192 count=131072 and consistently get ~ 50 MB/s with no load and ~ 35 MB/s with full database load.

This seems very slow, but I don't know what is a reasonable expectation.  What is the rough range of throughput I should be shooting for? 

Thank you,

James

BrendonHiggins
8,372 Views

Is it just the 1st page which is slow then all others are fast? {Thinking DNS or security look ups on 1st hit}

Are there any routers between servers and SAN? What do you ping time look like between the servers and SAN?

tvaillancourt
8,372 Views

Hey Guys,

Thanks for the great responses.

We do infact have our SQL, NFS and HTTP connections running on the same VLAN, which could be an issue, but at the same time I am able to sustain 50MB/s using scp to copy a 3GB file from a webserver to another webserver over our LAN.

I tested using a linux NFS server running SuSE on the same VLAN - 1000mb/s and got these results, the first value is the SuSE machine (/root/nfs), the second is the NetApp (/home/technical). I used dd to write a 100mb file to each NFS mount. I used the above NFS mount options for the NetApp mount, and defaults for the SuSE. The SuSE NFS is running on a HP Smart Array 5i with a 4 disk RAID 5.

Results:

[root@content1 nfs]# for i in "/root/nfs" "/home/technical"; do dd if=/dev/zero of=$i/test.file bs=1024 count=100000; done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 1.55867 seconds, 65.7 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 4.82861 seconds, 21.2 MB/s
[root@content1 nfs]# for i in "/root/nfs" "/home/technical"; do dd if=/dev/zero of=$i/test.file bs=1024 count=100000; done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 2.97338 seconds, 34.4 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 3.94693 seconds, 25.9 MB/s
[root@content1 nfs]# for i in "/root/nfs" "/home/technical"; do dd if=/dev/zero of=$i/test.file bs=1024 count=100000; done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 1.45381 seconds, 70.4 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 4.20845 seconds, 24.3 MB/s
[root@content1 nfs]# for i in "/root/nfs" "/home/technical"; do dd if=/dev/zero of=$i/test.file bs=1024 count=100000; done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 1.50167 seconds, 68.2 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 3.87372 seconds, 26.4 MB/s

Mount options (10.0.0.10 = NetApp, 10.0.0.50 = SuSE Box):

[root@content1 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.0.0.10:/httpdlogs on /home/httpdlogs type nfs (rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac,addr=10.0.0.10)
10.0.0.10:/technical on /home/technical type nfs (rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,addr=10.0.0.10)
10.0.0.50:/home/nfs/technical on /root/nfs type nfs (rw,addr=10.0.0.50)

Trace/Ping to NetApp:

[root@content1 ~]# ping 10.0.0.10
PING 10.0.0.10 (10.0.0.10) 56(84) bytes of data.
64 bytes from 10.0.0.10: icmp_seq=1 ttl=255 time=0.113 ms
64 bytes from 10.0.0.10: icmp_seq=2 ttl=255 time=0.075 ms
64 bytes from 10.0.0.10: icmp_seq=3 ttl=255 time=0.101 ms
64 bytes from 10.0.0.10: icmp_seq=4 ttl=255 time=0.096 ms

--- 10.0.0.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.075/0.096/0.113/0.015 ms

[root@content1 ~]# traceroute 10.0.0.10
traceroute to 10.0.0.10 (10.0.0.10), 30 hops max, 40 byte packets
1 athena (10.0.0.10) 0.102 ms 0.085 ms 0.076 ms

So the network doesn't look to bogged down. We run 600-700 NFS OPs/sec at 3-6% CPU Utilization. We have 6 x 500gb SATA drives in our FAS2020 with Dual-Parity and 1 hot spare. So we have 3 disks, 2 partity and a spare. I know this isn't a lot of spindles, but the HP only has 4 drives and being RAID 5 has to do it's parity within it's 4 drives, so I figured the spindle amount is equal or better on the NetApp but maybe with Dual-Parity and Snapshots 3 spindles isn't enough?

Anyone have thoughs on our spindle amount and how little performance we get from them?

Has anyone seen huge performance gains by using Multimode VIFs at our current OPs/sec? They advise we try that but I doubt we have enough OPs/sec for it to help.

Thanks again!

T V

Public