Data Protection

SQL Server (Microsoft Dynamics NAV) Performance

reoveiling
12,587 Views

Hi,

Current situation :

We recently bought a NetApp FAS 3140 Metrocluster with 26 x 500Gb sata disk for each side. This system is
used to store all our VMware server in NFS volumes. We have a few iSCSI connected LUN's for the Exchange

and SQL Server 2008 (requirement for snapmanagers).

Database info :

  • Instance 1 = Database for our WMS system
  • Instance 2 = Database for Microsoft Dynamics NAV 5.0
  • Instance 3 = Our main database, not yet on this system (soon to be)

Since we migrated our WMS and Dynamics databases to the new SQL Server (on the LUN's) we started having
performance problems when a high load was created on the Microsoft Dynamics system. The WMS and
Dynamics database both slowed down below useable limits.

I have contacted support and got a lot of performance tracking information. At the moment I'm monitoring network,

cpu, disk I/O on the SQL Server and the NetApp system and got lot's of data, but can really firgure out what is

causing the problems so far. So I would be interessed in hearing from anyone who has experience with Micrsoft

Dynamics NAV on NetApp volumes and its performance.

What I have tried so far :

  • Tracking latency with perfmon.msc on the NetApp directly (iscsi_read_latency, iscsi_write_latency, iscsi_ops)
  • Tracking latency with perfmon.msc on the SQL Server (Avg. Disk sec/Transfer on LogicalDisk)
  • Running 'LUN STATS' on the affected LUN
  • Running 'SYSSTAT'

The results :

  • On the high load moments I can see that the latency on the affected LUN volums is peaking at 300ms and isn't
    dropping under 90-100ms. That ofcourse is ways to much above 20ms.
  • When trying to generate load on the database by running simple SQL commands outside Dynamics I wrote
    around 100.000 records to a new table without seeing the latency go above 2ms.
  • Same thing with simple reading of bulk amounts of records.
  • When copying files to that volume (2GB file) I can see that the average latency is around 2ms when reading it
    and around 300ms when writing it. Don't know why this is like this ?

Questions :

  • Is anybody having baseline performance information that I can compare my information to ?
  • Is anybody running Dynamics or a high load SQL Server on SATA disks ?
  • Are there any performance tuning parameters that I could check/try ?
  • I know the peak latencies aren't good/normal, but how can I make sure that it is connected to let's say the
    SATA disks and not anything else on the storage or the VMware system ?

Thanks in advance for any extra information on this topic...

19 REPLIES 19

radek_kubka
12,481 Views

Hi and welcome to the Communities!

This is a tricky one, as there may be a gazillion of reasons why performance is poor - e.g. it may be storage, but equally it may be iSCSI connectivity (say a lack of jumbo frames).

In general using SATA disks for running any sort of databases where a lot of random IO activity occurs is not recommended.

Can you post a small sample of your sysstat -x 1 output, ideally taken when performance sucks?

Regards,

Radek

reoveiling
12,481 Views

Hi,

The peak loads for today had already past by, but I repeated one of the tests with copying a file to the affected volume.

The output of the command can be seen in this table and look the same as what we see when performance is low.

I will capture again during peak load tomorrow (around 9AM belgian time) and update this info, maybe you already see
something strange on the data below ?

The iSCSI colum is showing latency of the disks ?

  6%   560     1     0     561   496 16417  13460  23152     0     0    32   98% 100%  :f  29%      0     0     0     0
  3%    75     1     0      84   148   571   1760  10204     0     0    32   87%  76%  :   42%      0     8     0     0
  2%   124     1     0     129  1158  2749   2540      0     0     0    32   95%   0%  -    8%      0     4     0     0
  7%    46     1     0     153 27884   550    132      0     0     0    32   96%   0%  -    4%      0   106     0     0
  3%    35     1     0      59  6413   202     72     48     0     0    32   97%   0%  -   11%      0    23     0     0
  9%    61     1     0     190 29446   712    196      0     0     0    32   96%   0%  -    4%      0   128     0     0
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out
12%    31     1     0     164 34715   765   1528   3048     0     0    34   98%  11%  Mn  13%      0   132     0     0
25%   101     1     0     233 35267   722   4996 144216     0     0    20   98% 100%  :s  79%      0   131     0     0
11%    22     1     0     152 34998   683    572  39464     0     0    20   98% 100%  :f  45%      0   129     0     0
23%    18     1     0     146 34179   711   2540 131608     0     0    20   97%  99%  Mn  81%      0   127     0     0
17%    37     1     0     157 35126   705   1156  91584     0     0    20   98% 100%  :v  82%      0   119     0     0
10%    50     1     0     175 29853   605    912   6744     0     0    20   98%  21%  Mn  18%      0   124     0     0
24%    82     1     0     100 11843   403   7572 172752     0     0    20  100% 100%  :f  88%      0    17     0     0
  9%    19     1     0     148 29070   568    824  18704     0     0    20   93%  61%  :   48%      0   128     0     0
  8%    11     1     0     143 35020   699    188      0     0     0    20   94%   0%  -    5%      0   131     0     0
32%    24     1     0     153 34914   749   6176 161456     0     0    20   98%  98%  Mf  93%      0   128     0     0
12%    21     1     0     150 34963   734    784  37240     0     0    20   92%  66%  :   62%      0   128     0     0
12%    41     1     0     170 34394   708   3356  19608     0     0    20   96%  14%  Mn  16%      0   128     0     0
27%    35     1     0     164 35448   711   1548 177400     0     0    20   98% 100%  :f  90%      0   128     0     0
  9%    29     1     0     161 34982   804    396    168     0     0    20   96%  24%  Mn  27%      0   131     0     0
29%    17     1     0     146 34699   675   4032 174432     0     0    20   99% 100%  :f  92%      0   128     0     0
10%    53     1     0     164 35225   694    864  22440     0     0    20   94%  57%  :   39%      0   110     0     0
  2%    49     1     0      72  5830   144     24     48     0     0    20   98%   0%  -   11%      0    22     0     0
11%    36     1     0     165 28677   577   3584  16208     0     0    20   98%  16%  Mn  14%      0   128     0     0
29%    31     1     0     162 35510   711   3228 181952     0     0    20   98%  98%  :   85%      0   130     0     0
10%    29     1     0     158 34903   732    384   1726     0     0    20   95%   5%  Mn  13%      0   128     0     0
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out
27%    43     1     0     172 36628   807   4963 158817     0     0    20   98% 100%  :f  90%      0   128     0     0
11%    55     1     0     184 35113   830    756  36312     0     0    20   95%  84%  :   55%      0   128     0     0
14%    50     1     0     165 34250   746   6326  19285     0     0    20   98%  21%  Mn  24%      0   114     0     0
23%    38     1     0      56  6389   151   4540 180120     0     0    20   99% 100%  :v  87%      0    17     0     0
  2%    73     1     0      77   425   525    552      0     0     0    20   93%   3%  :    5%      0     3     0     0
  1%    56     1     0      61   174    78    120     64     0     0    20  100%   0%  -   10%      0     4     0     0
  1%    51     1     0      55    73    33    108      0     0     0    20   97%   0%  -    3%      0     3     0     0
  1%    16     1     0      18    57    41     44      0     0     0    20  100%   0%  -    3%      0     1     0     0
  1%    43     1     0      47   330    83    120     48     0     0    20   94%   0%  -   11%      0     3     0     0

radek_kubka
12,481 Views

iSCSI column shows the number of iSCSI operations per second.

Quick analysis:

Although you are saying this is not during peak, disks are rather busy I would say (Disk util column), so it *may* simply mean you have too few / too slow spindles to cope with the performance requirement.

thomas_glodde
12,481 Views

Hi,

You really only have S-ATA disks attached? They arent the right disk type for database applications as their read latency is quite high and slow write performance (your issues as it seems).

Besides taht, you hit quite a few M CP types:

M CP caused by low mbufs; writes data to the disk in order to prevent an out of memory buffers situation.

Possibly something netapp global support should take care of.

Kind regards

Thomas

reoveiling
12,481 Views

We only have S-ATA disk connected to our NetApp. Out hardware vendor did assure us
that the performance on these disk would do for our implementation. I'm trying to figure

out if this is indead a problem with the SATA disk or if something else is causing the

current issues ...

With what I hear now, adding an extra shelf with smaller, but ofcourse, faster FC disk could

be the simple solution (If the money is made available ..).

I will try to recreate these stats tommorow during a peak load on the SQL Server itself and post

them back at this location. Maybe you guys can then have a look at that data again. I will

capture as much data as possible during that period.

Thanks for the extra information so far.

reoveiling
12,480 Views

Hi,

Today I tried to create the logs during the periode that the batch process is running. Unfortunatly the performance problems where not that bad

as the previous days so I don't know if anything will be visible on this output.

NORMAL LOAD @ 8:30 AM :

You can see a lot of NFS traffic at that moment. About 2 hours later this traffic goes below 100ops for almost the rest of the working day.

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out
  8%   745     1     0     750  2149 16003  15524  12776     0     0     2   88%  88%  :   47%      0     4     0     0
  8%  1484     1     0    1493  6045  6213   7652      0     0     0     2   89%   0%  -   26%      0     8     0     0
10%  1646     1     0    1647  6410  5234  13008     48     0     0     2   85%   0%  -   28%      0     0     0     0
12%  2514     1     0    2515  9203 12811  16100      0     0     0     2   80%   0%  -   25%      0     0     0     0
  8%  1412     1     0    1413  5121 29599   9396      0     0     0     2   91%   0%  -   28%      0     0     0     0
  5%   487     1     0     489   564 15234  16916     64     0     0     5   73%   0%  -   23%      0     1     0     0
  3%   337     1     0     338   337  5083   5804      0     0     0     5   62%   0%  -   24%      0     0     0     0
  3%   275     1     0     276   667  3462   3952      0     0     0     5   69%   0%  -   23%      0     0     0     0
  4%   279     1     0     280   227  8092  10740     48     0     0     5   61%   0%  -   23%      0     0     0     0
  4%   313     1     0     314   271 10446  11876      0     0     0     5   64%   0%  -   34%      0     0     0     0
  7%   525     1     0     528   531 23875  25428     24     0     0     5   79%   7%  Tn  31%      0     2     0     0
  7%   398     1     0     399   250 17349  23188  13368     0     0     5   86% 100%  :n  46%      0     0     0     0
  8%   264     1     0     265   191 12111  14456  51560     0     0     5   89% 100%  :s  74%      0     0     0     0
  8%   491     1     0     492   712 22276  26940   9504     0     0     2   83% 100%  :v  60%      0     0     0     0
  3%   283     1     0     284   338  5606   8236     48     0     0     2   65%   4%  :   30%      0     0     0     0
  6%   555     1     0     556   567 15916  19336     16     0     0     5   70%   0%  -   32%      0     0     0     0
  5%   469     1     0     470   342 22930  23652      0     0     0     5   81%   0%  -   22%      0     0     0     0
  7%   515     1     0     516   463 25817  27928     48     0     0     2   79%   0%  -   27%      0     0     0     0
  8%   745     1     0     746   825 36942  35008      0     0     0     5   89%   0%  -   30%      0     0     0     0
  8%   857     1     0     858   479  7441  21836   9156     0     0     2   83% 100%  :f  84%      0     0     0     0
  8%  1010     1     0    1012   381  8164  25584     64     0     0     2   65%  53%  :  100%      0     1     0     0
15%  1439     1     0    1440  1319 87287  82236      0     0     0     2   97%   0%  -   28%      0     0     0     0
15%  1420     1     0    1421  1631 84154  78760      0     0     0     2   98%   0%  -   29%      0     0     0     0
15%  1402     1     0    1403  2694 84899  79608     64     0     0     2   98%   0%  -   26%      0     0     0     0
  8%   651     1     0     652   707 32614  33692      0     0     0     7   70%   0%  -   45%      0     0     0     0
  9%  1338     1     0    1339  1032 33431  34968     48     0     0     7   76%   0%  -   38%      0     0     0     0
19%  2480     1     0    2489  6439 46294  52404   3344     0     0     6   82%  14%  Tn  47%      0     8     0     0
17%  2370     1     0    2371  8721 34540  37732  19560     0     0     6   87% 100%  :f  71%      0     0     0     0
17%  2902     1     0    2903  7834 53625  48624   5024     0     0     2   86%  58%  :   53%      0     0     0     0
14%  2144     1     0    2145  6831 47622  41276      0     0     0     2   83%   0%  -   56%      0     0     0     0
10%  1201     1     0    1202   632 50712  37496      0     0     0     1   75%   0%  -   47%      0     0     0     0
13%  1451     1     0    1452   890 61862  60376     48     0     0     1   78%   0%  -   44%      0     0     0     0
  8%   827     1     0     828   591 30466  32964     16     0     0     1   70%   0%  -   48%      0     0     0     0
11%  1199     1     0    1200   772 43039  42096      0     0     0     8   65%   0%  -   43%      0     0     0     0
10%   859     1     0     860   571 41098  40604     48     0     0     6   68%   0%  -   38%      0     0     0     0
  7%   627     1     0     628   681 31307  32304      0     0     0     6   71%   0%  -   39%      0     0     0     0
  8%   582     1     0     583   447 28556  27900      0     0     0     6   68%   0%  -   40%      0     0     0     0
11%   875     1     0     876   805 47706  53906     64     0     0     1   84%   0%  -   39%      0     0     0     0


HIGHER LOAD @ 09:30 AM :

This is when the Dynamics system was running one of it's dayly batch processes.

Here you can see the iSCSI traffic from the SQL server is doing something, but less then I expected to see.

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out
20%   682     1     0     881  1716 82960  83396  13356     0     0     5   72%  75%  Tv  61%      0   198     0     0
17%   627     0     0     847  1146 92676  89804     36     0     0    17s  60%  14%  :   47%      0   220     0     0
15%   131     2     0     391   864 63381  62456     64     0     0     5   42%   0%  -   47%      0   258     0     0
15%   193     1     0     483   809 67962  68804      0     0     0     3s  42%   0%  -   50%      0   289     0     0
16%   304     1     0     568  1051 72821  70496      0     0     0     3s  45%   0%  -   41%      0   263     0     0
16%   239     1     0     523   840 71232  68920     48     0     0     3s  44%   0%  -   50%      0   283     0     0
14%   199     1     0     424  1044 66370  64080      0     0     0     3s  42%   0%  -   47%      0   224     0     0
16%   256     1     0     545  1992 74446  72452     16     0     0     5   43%   0%  -   41%      0   288     0     0
15%   276     1     0     496   974 66720  65076     48     0     0     3s  43%   0%  -   51%      0   219     0     0
16%   244     1     0     466  1193 67027  64692      0     0     0     2s  43%   0%  -   42%      0   221     0     0
13%   205     1     0     425  1522 58624  56116      0     0     0     3s  39%   0%  -   39%      0   219     0     0
16%   382     1     0     535  1540 55811  60020  13872     0     0     3s  70%  84%  Tf  65%      0   152     0     0
15%   153     1     0     389   808 66030  64792   4624     0     0     2s  40%  46%  :   53%      0   235     0     0
20%   221     1     0     549  1236 91968  89680      0     0     0     2s  40%   0%  -   40%      0   327     0     0
18%   137     1     0     468  1413 82047  79788     48     0     0     2s  37%   0%  -   38%      0   330     0     0
18%   308     1     0     581  2641 73683  71204      0     0     0     1s  48%   0%  -   61%      0   272     0     0
18%   267     1     0     556  1309 82919  80448      0     0     0     2s  42%   0%  -   58%      0   288     0     0
21%   339     1     0     672  1334 94787  91556     64     0     0     1s  40%   0%  -   51%      0   332     0     0
18%   185     1     0     492   973 84921  84532      0     0     0     5   39%   0%  -   51%      0   306     0     0
19%   183     1     0     507  1328 84116  81860      0     0     0     1s  37%   0%  -   42%      0   323     0     0
22%   225     1     0     580  1386 99981  95552     48     0     0     1s  37%   0%  -   50%      0   354     0     0
22%   319     1     0     674  1302 107788 105248     0     0     0     1s  43%   0%  -   42%      0   354     0     0
21%   139     1     0     454   956 83748  83740   9832     0     0     1s  51%  53%  Tn  53%      0   314     0     0
19%   153     1     0     495   905 78429  75816  12284     0     0     1s  49% 100%  :f  63%      0   341     0     0
16%   206     1     0     484   832 67512  67864     64     0     0     1s  40%  19%  :   52%      0   277     0     0
17%   918     1     0    1207  1191 73375  69785      0     0     0     1s  35%   0%  -   70%      0   288     0     0
23%   196     1     0     587  1209 106831 103288    48     0     0     1s  36%   0%  -   50%      0   390     0     0
19%   629     1     0     902  1068 81943  79732     16     0     0     1s  38%   0%  -   71%      0   272     0     0
23%   203     1     0     626  1382 110019 106493     0     0     0     0s  33%   0%  -   46%      0   422     0     0
22%   283     1     0     620  1192 100199  97040    48     0     0     0s  38%   0%  -   53%      0   336     0     0
20%   197     1     0     542  1422 92367  89644      0     0     0     1s  34%   0%  -   46%      0   344     0     0
22%   215     1     0     571  1723 101591  97360     0     0     0     0s  39%   0%  -   59%      0   355     0     0

Do you guys see anything strange on the NetApp's performance ?

thanks !

radek_kubka
12,480 Views

Quick shot at these stats:

Interestingly enough, disk utilisation today @9:30 is lower with more iSCSI operations, than yesterday (with less iSCSI ops) & we can say the filer is coping just fine.

My informed guess is that disk reads vs. writes make the whole difference - there were mostly reads in today's stats & many writes yesterday.

That would somewhat fit into the nature of SATA drives which doesn't cope well with random writes.

Regards,

Radek

jausch
12,481 Views

One thing that I didn't see in the thread was the type and size of SQL server.  One thing that really affects the amount of disk load SQL places onto the storage system is the amount of RAM in the box.  In general, more RAM for SQL means lower IOPS pushed down to the storage subsystem because SQL can be smarter about caching and write ordering and etc.

Are you running x64 SQL?  How much RAM is assigned to the VM?  These days, I almost always run x64 SQL and then tune RAM based on observed performance.  If adding RAM does not help performance, then you can back off a bit.  May require some tweaking.

Alex

reoveiling
11,060 Views

Our server is running Windows Server 2008 x64 and SQL Server 2008 x64. The machine has 12GB of RAM assigned to it and is running as only VM on that physical server (Server has 24GB, so 12GB unused at the moment, as part of the disaster recovery plan).

The amount of RAM assigned to each Instance is :

  • Microsoft Dynamics : 6GB
  • WMS Database : 3GB
  • Main Database : 3GB (only 1GB used because still in test)

So let's way we up the memory of the complet server with 6GB and assign 4GB of our Dynamics database.

We would need to see a drop in IOPS to the NetApp ?

radek_kubka
11,060 Views
So let's way we up the memory of the complet server with 6GB and assign 4GB of our Dynamics database.

We would need to see a drop in IOPS to the NetApp ?

I am a tad skeptical about the outcome, as writes, not reads seem to be the main culprit - for that reason host-side caching (achieved by increasing available memory) not necessarily will improve a lot (if anything).

Having said that, this is a virtual environment, so I see some sense in at least trying this.

Regards,
Radek

thomas_glodde
11,060 Views

Your NetApp machine seems fine, there is no disk utilization above 80%. Adding more memory wont help you with writing, helps with read only usualy.

Your disk read/write amount is around 80-100MB each, which fits exactly to 2 x 1GBit ethernet links. What & how many links do you use for iSCSI traffic? You might hit a link saturation. Have you read the Best Practices for VMware & NetApp iSCSI connectivity?

Besides that, your "sysstat -x 1" does not show iSCSI kB/s in/out statistics, means you are on an older version of data ontap. Consider upgrading to the latest 7.3.4 release.

jausch
11,060 Views

16GB is a pretty healthy size for SQL.  Do you have any idea how large your working set is?  If it's less than that, adding more RAM won't help.  If you're willing to take a reboot, adding more RAM won't HURT anything.  SQL certainly LOVES RAM.

Also, is SQL in Simple Recovery mode or standard?

Increasing your RAM will increase your checkpoint depth which means that DB writes will be held in memory longer.  However, Log writes will always continue immediately.  Are you writing logs to the same LUN as your database?  How many logs are you generating per hour?

What is perfmon reporting as your average disk latency on your logical disk object?  I'd love to see a dump of the whole logical disk object.  How many IOPS report that you are consuming?

As others have pointed out, you may simply be exceeding the IOPS budget for these relatively slow spindles.  Separating log writes from the DB onto a separate aggregate may help you get around this if you haven't done that already.

Alex

carstensejer
11,060 Views

Seems like the 12 GB is all allocated to SQL server instances. There need to be some memory for Windows too. With 12 GB total no more than 10 GB should be for SQL server (see http://www.sqlservercentral.com/blogs/glennberry/archive/2009/10/29/suggested-max-memory-settings-for-sql-server-2005_2F00_2008.aspx)

I often hear people complaining about the SAN/storage when their SQL server performs badly. Most of the times it can be fixed by optimizing the Windows, SQL server or connectivity to the storage.

Do you have several vCPU's and files for the databases and tempdb?

A great tool for troubleshooting a SQL server problem is Spotlight from Quest. It gives you a great overview (almost on a manager can understand ) and enables you to drill down where the red lights/problems are. There's a 30 days trial version on their website.

-Carsten

reoveiling
11,059 Views

Problem solved : Replaced NetApp with HP P4500 Multi-Site SAN

nigelg1965
9,933 Views

And that has SAS disks, right?

reoveiling
9,933 Views

indeed, 4 x P4500 Controller, with each 12 x 600GB 15K rpm SAS disks,

thomas_glodde
9,933 Views

hey,

dont get me wrong but you had a problem with an apple and now bought an orange to solve it ...?!

peter_lehmann
9,933 Views

No worries, he replaced a wrong sized NetApp with a "storage system" from HP... Poor guy

Public