VMware Solutions Discussions

Performance issue FAS3240

TOBIAS1979
10,416 Views

HI

We are hitting som performance issue with virtuals machine using the NFS protocoll and cifs share.

I am in need of some tips on how to design our storage to minimize our performanxe issue. We have decided to try to use of the NFS protocol as far as it gets.

We have three different levels of SLA and function  high, medium and low.

High is the SAS disk

Medium one volume with  SAS and one with SATA.

Low is only SATA.

We have put most of the servers in the low volume around 80 out of 100.

We have SAS = 113X190 = 21470 IOPS + 35 ~ 40% cache

SATA = 40x40 = 1600 IOPS + 35 ~ 40% cache

We have some problems with it is slow to copy small files and some latency problems.

Rgds

Tobias

10 REPLIES 10

lmunro_hug
10,391 Views

Hi,

Are the SAS and SATA disks owned by the same controller? If so this is most likely your problem. When doing writes the NVRAM is shared and the slower SATA disks will affect the entire controller performance, even those volumes using SAS disks.

We have this same issue, best bet is to remove the SATA disks....... or find a way not to use them very much.

Luke

TOBIAS1979
10,391 Views

Yes i have both SATA and SAS on the same controller. Is´t not recommended to have both sata and sas on the same controller?

Tobias

radek_kubka
10,391 Views

It's more complicated than that IMHO.

In some cases SATA can 'slow down' SAS disks - *if* SATA disks are heavily used & NVRAM flushes are slowed down by waiting to write to them.

Can you post some basic performance stats? E.g. output of "sysstat -x 1" when performance problems occur?

Regards,

Radek

lmunro_hug
10,391 Views

Keen to see the outcome and NetApp staff comments on the use of SAS/SATA disk use on the same controllers.

Luke

davidrnexon
10,391 Views

Hi, can you please post the following output: jump into priv set diag and type sysstat -M 1 for about 20 seconds during times of high latency or slowness ?

Do you monitor the network interfaces on the Netapp ? Can you let us know what the utilization is please ? Is it set as a vif ?

TOBIAS1979
10,391 Views

Here´s the output

faspri02> sysstat -x 1

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s

                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out

12%    973      2      0    1007    1986  53985   15548      0       0      0    41s    53%    0%  -    25%       0      0     32       0      0       0   8389

  9%   1063      0      0    1099    3350  55988   11056      0       0      0    41s    53%    0%  -    31%       0      2     34       2      0       1   8389

11%   1124      1      0    1165    5765  58253   10536     24       0      0    41s    54%    0%  -    27%       0      0     40       0      0      80   7864

18%   2093     10      0    2140   25023  70143   12108      0       0      0    42s    57%    0%  -    37%       3      1     33       1      0       1   8389

15%   1327      0      0    1356    7933  68789   11417      8       0      0    44s    53%    0%  -    32%       0      0     29       0      0       0   7595

20%   1449      3      0    1492    1511  88703   17468  50376       0      0     1     65%   79%  Tf   59%       0      9     31       6      0       0   8126

17%   1499      0      0    1539    1299  93663   11752  38848       0      0    43s    64%   72%  :    39%       0      1     39       1      0      39   8389

28%   1948      0      0    1984   22133  96155   11105      0       0      0     1     59%    0%  -    27%       0      0     36       0      0       3   8265

65%   1666      0      0    1707   15153  84199   12444     24       0      0     1     72%    0%  -    25%       5      0     36       0      0      13   8389

50%   1280      2      0    1314    2307  70227   10376      8       0      0    43s    71%    0%  -    27%       0      0     32       0      0       0   8323

  9%   1136      0      0    1171    1154  66437   10900      0       0      0    43s    53%    0%  -    23%       0      1     34       1      0       1   8389

12%   1168      3      0    1206    1259  65226   12852     24       0      0    44s    53%    0%  -    32%       0      2     33       2      0       1   8389

10%   1378      0      0    1423    1659  69824   13396      0       0      0    44s    53%    0%  -    31%       0      9     36       5      0       3   8126

13%   1128      1      0    1192    2061  57144   12268      0       0      0    44s    52%    0%  -    34%       4     25     34      29      1       5   8389

10%   1190      1      0    1220    2963  50226   14260     32       0      0    44s    53%    0%  -    43%       0      0     29       0      0       0   7602

13%   1450      1      0    1485    2358  65201   10692      0       0      0    45s    54%    0%  -    32%       0      0     34       0      0       2   7356

21%   1727      1      0    2106    6868  60533   17604  32692       0      0    45s    60%   29%  Tn   67%     349      0     29       0      0       0   7602

28%   2159      0      0    2195   14922  63263   17532  45436       0      0    45s    75%   84%  :    61%       0      0     36       0      0       3   8126

19%   2294      0      0    2331    8297  66457   11928      0       0      0    45s    57%    0%  -    32%       0      1     36       1      0      13   8126

radek_kubka
10,391 Views

I'd say it looks pretty healthy:

- low CPU utilisation

- low disk utilisation

- high cache age & cache hit ratio

Where this high latency you mentioned in the original post is being reported?

lmunro_hug
10,391 Views

It would be interesting to see the same sysstat results when the hosts are doing a large number of writes to the SATA disks. If you are a MS SQL shop try a large DB backup to local disks or snapshot the VM leave it for a while to build up a few GB of changes and then commit the snapshot with sysstat -x 1 running.

radek_kubka
10,391 Views

Don't be cruel 😉 - SATA disks aren't really designed for these types of tasks!

tmaddox
7,836 Views

Just a few thoughts:

  1. Keep all of your VM's on non-SATA drives, regardless of any internal SLA's you may have established. They don't perform fast enough and can cause all sorts of slowness in your filer evnrionment. Base your SLA tiers on resource pools within your virtual management application.
  2. As for NFS, NetApp's run great using the protocol. We have multiple filers utilizing NFS and have no issues with the NFS protocol. If you are using VMware, make sure you download and install the NetApp Virtual Storage Console for vSphere and run the tool to set all of your NFS timeout and settings.  Also verify that you have your VIF configured correctly on the filer and that you are teaming it properly throughout your network switch.
  3. MOST IMPORTANT!! Make sure that all of your VM's are properly aligned. Mis-aligned VM's cause a tremendous amount of double work on the filer and can severely hinder the overall performance of the filer.  There are a number of resources out there on how to align your VM's along with built in tools in the Virtual Storage Console.
  4. Verify there are no other workloads/flexclones that are running out of control causing extra workloads on your filer.

Cheers,

-Thom

Public