Re: Performance issue FAS3240

TOBIAS1979 · ‎2012-07-01

HI

We are hitting som performance issue with virtuals machine using the NFS protocoll and cifs share.

I am in need of some tips on how to design our storage to minimize our performanxe issue. We have decided to try to use of the NFS protocol as far as it gets.

We have three different levels of SLA and function high, medium and low.

High is the SAS disk

Medium one volume with SAS and one with SATA.

Low is only SATA.

We have put most of the servers in the low volume around 80 out of 100.

We have SAS = 113X190 = 21470 IOPS + 35 ~ 40% cache

SATA = 40x40 = 1600 IOPS + 35 ~ 40% cache

We have some problems with it is slow to copy small files and some latency problems.

Rgds

Tobias

lmunro_hug · ‎2012-07-01

Hi,

Are the SAS and SATA disks owned by the same controller? If so this is most likely your problem. When doing writes the NVRAM is shared and the slower SATA disks will affect the entire controller performance, even those volumes using SAS disks.

We have this same issue, best bet is to remove the SATA disks....... or find a way not to use them very much.

Luke

TOBIAS1979 · ‎2012-07-01

Yes i have both SATA and SAS on the same controller. Is´t not recommended to have both sata and sas on the same controller?

Tobias

radek_kubka · ‎2012-07-02

It's more complicated than that IMHO.

In some cases SATA can 'slow down' SAS disks - *if* SATA disks are heavily used & NVRAM flushes are slowed down by waiting to write to them.

Can you post some basic performance stats? E.g. output of "sysstat -x 1" when performance problems occur?

Regards,

Radek

lmunro_hug · ‎2012-07-02

Keen to see the outcome and NetApp staff comments on the use of SAS/SATA disk use on the same controllers.

Luke

davidrnexon · ‎2012-07-02

Hi, can you please post the following output: jump into priv set diag and type sysstat -M 1 for about 20 seconds during times of high latency or slowness ?

Do you monitor the network interfaces on the Netapp ? Can you let us know what the utilization is please ? Is it set as a vif ?

TOBIAS1979 · ‎2012-07-03

Here´s the output

faspri02> sysstat -x 1

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s

in out read write read write age hit time ty util in out in out

12% 973 2 0 1007 1986 53985 15548 0 0 0 41s 53% 0% - 25% 0 0 32 0 0 0 8389

9% 1063 0 0 1099 3350 55988 11056 0 0 0 41s 53% 0% - 31% 0 2 34 2 0 1 8389

11% 1124 1 0 1165 5765 58253 10536 24 0 0 41s 54% 0% - 27% 0 0 40 0 0 80 7864

18% 2093 10 0 2140 25023 70143 12108 0 0 0 42s 57% 0% - 37% 3 1 33 1 0 1 8389

15% 1327 0 0 1356 7933 68789 11417 8 0 0 44s 53% 0% - 32% 0 0 29 0 0 0 7595

20% 1449 3 0 1492 1511 88703 17468 50376 0 0 1 65% 79% Tf 59% 0 9 31 6 0 0 8126

17% 1499 0 0 1539 1299 93663 11752 38848 0 0 43s 64% 72% : 39% 0 1 39 1 0 39 8389

28% 1948 0 0 1984 22133 96155 11105 0 0 0 1 59% 0% - 27% 0 0 36 0 0 3 8265

65% 1666 0 0 1707 15153 84199 12444 24 0 0 1 72% 0% - 25% 5 0 36 0 0 13 8389

50% 1280 2 0 1314 2307 70227 10376 8 0 0 43s 71% 0% - 27% 0 0 32 0 0 0 8323

9% 1136 0 0 1171 1154 66437 10900 0 0 0 43s 53% 0% - 23% 0 1 34 1 0 1 8389

12% 1168 3 0 1206 1259 65226 12852 24 0 0 44s 53% 0% - 32% 0 2 33 2 0 1 8389

10% 1378 0 0 1423 1659 69824 13396 0 0 0 44s 53% 0% - 31% 0 9 36 5 0 3 8126

13% 1128 1 0 1192 2061 57144 12268 0 0 0 44s 52% 0% - 34% 4 25 34 29 1 5 8389

10% 1190 1 0 1220 2963 50226 14260 32 0 0 44s 53% 0% - 43% 0 0 29 0 0 0 7602

13% 1450 1 0 1485 2358 65201 10692 0 0 0 45s 54% 0% - 32% 0 0 34 0 0 2 7356

21% 1727 1 0 2106 6868 60533 17604 32692 0 0 45s 60% 29% Tn 67% 349 0 29 0 0 0 7602

28% 2159 0 0 2195 14922 63263 17532 45436 0 0 45s 75% 84% : 61% 0 0 36 0 0 3 8126

19% 2294 0 0 2331 8297 66457 11928 0 0 0 45s 57% 0% - 32% 0 1 36 1 0 13 8126

radek_kubka · ‎2012-07-03

I'd say it looks pretty healthy:

- low CPU utilisation

- low disk utilisation

- high cache age & cache hit ratio

Where this high latency you mentioned in the original post is being reported?

lmunro_hug · ‎2012-07-03

It would be interesting to see the same sysstat results when the hosts are doing a large number of writes to the SATA disks. If you are a MS SQL shop try a large DB backup to local disks or snapshot the VM leave it for a while to build up a few GB of changes and then commit the snapshot with sysstat -x 1 running.

radek_kubka · ‎2012-07-03

Don't be cruel 😉 - SATA disks aren't really designed for these types of tasks!

tmaddox · ‎2012-07-03

Just a few thoughts:

Keep all of your VM's on non-SATA drives, regardless of any internal SLA's you may have established. They don't perform fast enough and can cause all sorts of slowness in your filer evnrionment. Base your SLA tiers on resource pools within your virtual management application.
As for NFS, NetApp's run great using the protocol. We have multiple filers utilizing NFS and have no issues with the NFS protocol. If you are using VMware, make sure you download and install the NetApp Virtual Storage Console for vSphere and run the tool to set all of your NFS timeout and settings. Also verify that you have your VIF configured correctly on the filer and that you are teaming it properly throughout your network switch.
MOST IMPORTANT!! Make sure that all of your VM's are properly aligned. Mis-aligned VM's cause a tremendous amount of double work on the filer and can severely hinder the overall performance of the filer. There are a number of resources out there on how to align your VM's along with built in tools in the Virtual Storage Console.
Verify there are no other workloads/flexclones that are running out of control causing extra workloads on your filer.

Cheers,

-Thom