2011-02-18 05:50 AM
I've been digging the Internet over and over for 3 weeks and couldn't find any reasonable answer, so in the end I decided to write post here. We're during virtualization of our servers. We've chosen NetApp FAS2020 (12x300GB SAS, iSCSI 1Gbit) and XenServer to work on top of 3xIBM x3550M3 servers. One of our goals is virtualization of SQL Server and here comes our problem. We get TPS (transactions per second) below our expectations.
Currently, our test setup is build to eliminate most problem-causing factors - FAS is connected directly to NIC (no switch), no bonds (we tested with them tho), no vifs (tested), no jumbo frames (tested), no VLANs (tested). While now SQL works inside VM, we've tested bare metal config with ~ the same results. We have separate LUNs for log and data, both connected with iSCSI Initiator, partitions are NTFS formatted (4k block) and aligned (strange that we don't see any perf difference when partition is misaligned). Our aggregate contains of 9 spindles - 8 RAID-DP + 1 spare to max out performace.
What was thought provoking, our sql benchmark script (~70k inserts, updated and deletes) executed as single transaction finished in 8s on our reference database server (x3650, 2xRAID1 - 15k RPM log, 10k RPM data) and 7s on virtualized SQL, while without transaction (every SQL query is single transaction then?!) in 21s on reference, and 40+s on virtualized. It turned out that most likely log writes are the bottle neck. We've benchmarked log LUN with sqlio (6GB file, 8 threads) and checked what sysstat -x 3 says. Here's what we've got (for full benchmark results see the attachment):
Current database server (log RAID):
|Write IOPS||Write MB/s||Read IOPS||Read MB/s|
|Sequential, 2k block||17394.26||33.97||11684.93||22.82|
|Sequential, 8k block||10264.61||80.19||4412.93||34.47|
|Sequential, 64k block||1168.91||73.05||1283.50||80.21|
|Random, 2k block||328.37||0.64||668.33||1.30|
|Random, 8k block||323.24||2.52||616.62||4.81|
|Random, 64k block||254.78||15.92||510.80||31.92|
|Write IOPS||Write MB/s||Write processor %||Write disk %||Read IOPS||Read MB/s||Read processor %||Read disk %|
|Sequential, 2k block||5235.93||10.22||~98%||~13%||7936.50||15.50||~65%||~4%|
|Sequential, 8k block||3378.50||26.39||~98%||~25%||6131.30||47.90||~83%||~9%|
|Sequential, 64k block||786.60||49.16||~100%||~44%||1483.43||92.71||~90%||~16%|
|Random, 2k block||1028.96||2.00||~55%||~73%||1037.59||2.02||~36%||~70%|
|Random, 8k block||785.73||6.13||~63%||~75%||1010.30||7.89||~37%||~69%|
|Random, 64k block||457.30||28.58||~84%||~76%||880.20||55.01||~63%||~74%|
As you can see, random access performs much better, while the key factor for db performance - sequence log write - is a few times worse. While 8 threads write sequentially 10.2MB/s, single thread is at level of ~3MB/s. During my benchmark query I can see writes to log (and mostly to log) at 1.5MB/s.
My questions are: is this typical performance for my setup? What can I do to improve it? I'm a little bit worried about running more VMs on this filler, when only with SQL Server on board performance is at the lower leve of acceptance.
2011-02-18 11:13 PM
We use SQL in VMware and it works great so the idea of a SQL server in a VM is not the issue. If the results are consistent you must be hitting a bottleneck and it is just a question of finding out 'where'.
This is a great guide into filer performance. It will help you to have a snoop around the filer and see if it is the cause of the issue. http://communities.netapp.com/docs/DOC-1051
I would also look at the Xen server and network. If no weak link shows high utilization you are looking for a configuration or driver version issue. Work your way through all the links of the chain with this tool. https://now.netapp.com/NOW/products/interoperability/
Hope it helps
2011-02-19 03:20 AM
Thank you for your answer - this put some hope on me I'll check theese tools. Moreover, I'd really like to hear what network equipment do you use (iSCSI/FC, type and speed of NIC, FAS model). It would also be very helpful if someone could make a simple test of FAS2020 with sqlio:
sqlio -kW -LS -fsequential -Fparam.txt -b2
using 2-4GB file on LUN. I'd like to be sure, there is any possibility to resolve our problem.
2011-02-19 05:26 AM
We are a big international and have lots of toys to play with. We have over 150 different SQL servers running as physical servers connected to NetApp disk as luns over both iSCSI and FCP. SQL in VMs which are stored in NFS both and FCP LUN datastores. And just to be sure we have everything, we also run SQL in VMs with NetApp luns connected to the VM via Windows iSCSI or NetApp FCP past through from the ESX server.
I wrote this a while ago about SQL / NetApp performance.
It may help.
Also, there have been many issues posted on the community about iSCSI performance. 9 times out of 10 the cause was a network issue. Search for "slow iSCSI" and have a look at some of the threads.
2011-02-19 06:14 AM
Could you please take a look at the atachment? It looks like neither processor nor disks were running at full capacity. Maybe, as Brendon suggests, it's something with the network...
As for the maxing controller, just a stupid question... is there a possibility of mixing FAS2020 (filler1) controller with FAS2040 (filler2) controller in active/active config?
2011-02-21 11:39 PM
Having considered maxing the controller (CPU), I concluded you might be correct, however one thread (ex. benchmark script) can't fully exploit it. I'm wondering what causes such a load during small writes (VLAN?) and if there is any chance to optimize these writes. Moreover, maybe there is a way to improve controller usage by SQL? Any answer would be appreciated, even: "NO"
2011-07-12 04:21 AM
Did you ever get any closure on your problem?
I am also running a FAS2020, and in the process of replacing it - partly due to the performance of the head when running SQL tasks such as large inserts, reads or SMSQL verifies (which are sequential read commands I believe).
We've had exactly the same issues - virtual SQL servers placed on an NFS LUN, with iSCSI LUNs for the databases/logs/snapinfo etc. The databases are only small (approx 10Gb), and we see extremely high CPU on the controller when any significant SQL tasks are running. And this is only 1 SQL server. Add in 20-30 other virtual servers, plus Exchange, more SQL servers, dedupe schedules, reallocate schedules, and database verifications for Exchange/SQL - and the CPU is in a constant flux.
We also notice that high CPU on the FAS2020 generally leads to high latency across the other protocols.
Unfortunately I can't provide any benchmarks as it's my production environment.
I am guessing you've come to the same conclusion as me, and that the FAS2020 is not powerful enough for what you wish to use it for?
I am interested to know what conclusions you came too!