Subscribe

High latency & queue length on 2050

We've built a new SQL 2008 server and we are seeing high latency and low throughput to our Netapp 2050 on this server. During the restore of a 250GB database backup file from DAS to SAN, Windows Resource Monitor is showing disk latency >200ms and queue length near 10 on the Netapp LUNs. We are using an IBM x3850 server with IBM's re-branded Emulex 8Gb/s FC card, connected to a Cisco 9120 FC switch (which only supports 2Gbps, but we hope to upgrade those switches soon), which connects to the SAN. On a file copy within the OS, I'm seeing these throughput rates:

Single 30GB file, C/D/E are DAS, R/S/T are SAN.

E: > D: 200-300MB/s

E: > R: 200-300MB/s

E: > S: 200-300MB/s

E: > T: 200-350MB/s

T: > S: <10MB/s

T: > R: 10-20MB/s

T: > E: 10-20MB/s

T: > D: 10-20MB/s

The attached file contains sysstat and lun stats. I captured this data during the same db restore. I need help with the commands to troubleshoot the bottleneck and understanding how to resolve it.

Thanks!

Jim

Re: High latency & queue length on 2050

  is this one that your team handles or should the question be asked in the NetApp Support Community?

Re: High latency & queue length on 2050

Jim,

Your filer is running flat out on both CPU and disk I/O. What is the layout of the aggregate and what kind of disks are they ?

Can you run the same sysstat command without the backup/restore running so that we can see the baseline ?

Any deduplication or reallocate jobs running ? (sis status, reallocate status -v)

How full/empty is the containing volume and its containing aggregate (df will show you this) ? If there are no background jobs going, the simultaneous high disk/CPU, with lots of reads despite only small incoming amounts of read activity makes me wonder if WAFL tetris is making the thing busy.

Re: High latency & queue length on 2050

I tried to grab the info from the filer and your commands, hope this is what you're requesting. We've ordered a replacement NetApp system along with the forklift upgrade but I don't think it's going to arrive before we want to put this server into production. However, it appears the old sql server's data is on the same aggregate as the new server so perhaps only having one of those sql servers being hit hard at once will get us by.

Thanks!

Jim

Re: High latency & queue length on 2050

Jim,

It's interesting. Your system is very busy even with the backup not running. Some comments.

Since the 2050 can take 20 internal drives I'm assuming you have 20x300GB SAS disks in the main shelf and you've added an FC shelf with 13 FC drives. Is that right or is there more than one extra shelf .. have you got hot spares ?

The filer isn't generating enough IO to max out a 2Gb/sec FC switch. So I'm not sure that bumping the switch will help a lot.

It looks like your load in terms of IOPs is within the capability of the disks you have. The imbalanced RAID groups make me wonder if you gradually added drives to the first aggregate, if so there may be hot spindles and doing a "reallocate" may help; since you're not using snapshots it's an easier decision. I am not sure if imbalanced RAID comes with any problems of its own. Someone else might be able to comment on that.

Can you check how long your dedupe jobs are taking to run ? Your disk reads are generally about 20-30% more than your FC reads, and your writes are substantially higher. Your sis status shows one volume in the "merging" phase, I reckon it's writing out the metadata to merge the duplicate blocks. sis status -l gives a more detailed readout, sis config should show the schedule. If your jobs are taking longer than a couple of hours to do, you might want to try turning this off. This may be the source of the heavy CPU usage.

Given that you're all FC with no NAS going on, it takes a bit more care and thought to get the value out of deduplication; you can only get real savings if you use thin provisioned LUNs/volumes. Given that your user volume is about 1TB and shows at 75% full it looks like you're getting a bit of dedupe but you're not returning that space to the aggregate. One thing you might try is turning dedupe off altogether.

The RAID tetris part depends, I guess, on exactly how full the filesystems within each of your LUNs are.

If you haven't already done so, raising a support ticket with NetApp and sending them a perfstat will almost certainly lead to some good advice. They can take you through checking to see whether or not a reallocate will help, and telling you what exactly the source of the high CPU and poor throughput is.

Re: High latency & queue length on 2050

BTW Jim in your original post you're saying that copying from a DAS drive to the SAN gets you 200-300MB/sec. Are you sure about that ? It's a hell of a job for 2Gb/sec FC to do 200MB let alone more than that.

Re: High latency & queue length on 2050

The throughput I mentioned in the original post was the peak range reported by the Windows GUI during the copy, but in hindsight the cache is skewing those numbers. The current utilization on netapp001 this morning is in the 60-70% range. I've re-run the file copies and the results are below. The test file is a 29.9GB SQL log file.

I've included the same commands and the additional commands you requested some of the attached logs. Netapp001_3.txt is without me testing anything, Netapp002_1.txt is during the E>R filecopy, Netapp001_4.txt is during the E>S filecopy, Netapp002_2.txt is during the E>T filecopy, Netapp002_3.txt is post-copy.

E>R : 7m56s to copy, averaging 64MB/s. The copy indicated a very fast start and then it dropped down to the 60MB/s range.

E>S: 12m3s to copy, averaging 42MB/s. Same thing, fast start, drops to 40MB/s range.

E>T: 7m27s to copy, averaging 68MB/s. Same, drops to 60MB/s range

Re: High latency & queue length on 2050

I'm not sure how to determine whether we have hot spares. Maybe I found it in Filerview under Storage>Disks>Manage>View Spare Disks? It shows a spare in shelf 2, bay 4, chan fc:a, 300GB, another spare on same shelf bay 13, and a third spare on shelf 1, bay 13, fc:a, 450GB.

The 2050 appears to have 20 drives in it, but while the light on the far right is blinking green, there doesn't appear to be any individual drive activity lights. In addition, we have three shelves of 14x300F, two shelves of 14x450A, and one shelf of 14x500A. I was told this morning that the order for new NetApp equipment was sent in last night. We already received a pair of new Brocade 8Gbps x24 FC switches.

Re: High latency & queue length on 2050

Jim, (reposting with the right account!)

Yeah, I had thought that caching on the client side might make your IO look fast, but given that it was a 30GB file I thought it would be much too large to be buffered all at once and that the real write speed would have been noticeable quite early on in the copy.

Now that we have eliminated deduplication and reallocation from the enquiry it is a case of identifying which of the volumes is causing all the activity.

The 001_3 log shows that the CPU is reasonably busy, hovering around the 70% mark on average, which to me is high. There is continuous read activity of around 40MB/sec. That read activity seems to be a constant theme through all of your logs. I notice in this log that the FCP out is roughly consistent with the disk read so this is data being pulled straight off the disk and being read out without much extra background stuff happening.

Looking at the corresponding LUN stats, it looks like roughly 20-30MB of the activity is consistently down to your user_data and kronos_servers LUNs.

As it stands I think you are reaching the upper limit of what the filer can do. My guess is that your next steps are :

- look into what the busy LUNs are actually doing. Is someone running a spurious process, backup/copy operation or something else which is constantly/unnecessarily reading ?

- on the filer side, reallocate may help you a lot. See https://communities.netapp.com/docs/DOC-4977. Try the "measure" operation for your volumes and report back on how you get on. (you might want to do this outside of business hours). In addition, you may want to look at enabling the "read_reallocate" flag on your volumes. This causes the filer to restripe the busy sections of a LUN so that they can be read quickly. Seehttps://communities.netapp.com/docs/DOC-6265.

Google for the read_reallocate and reallocate command and you can find out a bit more. There's a good chance that your LUNs will benefit.

Regarding drives .. the 2050 has a front bezel which you can remove, you'll see all the drives and their activity LEDs behind this. The light on the far right is the controller's activity light.

It sounds like a lot of the shelves you have are not connected to this controller - do you have another controller handy ? To find out that information, try sysconfig -A -r -d . This'll dump out a lot of information about your controllers, shelves, disks and so on. "vol status" will tell us what way your volumes are setup which might also be useful. Have you checked the cabling around the back to make sure the shelves are all connected back to the controller correctly ? (note that you'll probably need downtime if you find you need to make any changes here).

I'm getting the sense that the guy who set this up is no longer around and you have been tasked with figuring it out, yes ?

Re: High latency & queue length on 2050

I did remove the 2050 bezel, but still the only light activity I saw was on the controller light. The guy who set it up under NetApp's guidance is still here but he's been busy with other tasks and I need to learn more about it anyway so I can cover when he's out. I haven't checked the cabling. Our system doesn't like those sysconfig parameters...

---

fl2000-netapp001> sysconfig -A -r -d

The -c, -d, -m, -r, -t, and -V options are mutually exclusive.

usage: sysconfig [ -A | -c | -d | -h | -m | -r | -t | -V ]

       sysconfig [ -av ] [ <slot> ]

fl2000-netapp001> vol status

         Volume State           Status            Options

    esx_upgrade online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

new_yardi_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

vmware_datastore_2 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

fl2000_vmware_vol1 online          raid_dp, flex     nosnap=on, create_ucode=on

           root online          raid_dp, flex     root, create_ucode=on

fl2000_sql002_log online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

kronos_servers online          raid_dp, flex     nosnap=on, create_ucode=on

     yardi_data online          raid_dp, flex     nosnap=on, create_ucode=on

                                sis

     vtl_backup online          raid_dp, flex     nosnap=on, create_ucode=on

      user_data online          raid_dp, flex     nosnap=on, create_ucode=on

                                sis

fl2000-netapp001>

---

fl2000-netapp002> sysconfig -A -r -d

The -c, -d, -m, -r, -t, and -V options are mutually exclusive.

usage: sysconfig [ -A | -c | -d | -h | -m | -r | -t | -V ]

       sysconfig [ -av ] [ <slot> ]

fl2000-netapp002> vol status

         Volume State           Status            Options

      doc1_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                sis               guarantee=none

fl2000_esx002_volume_1 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

       bmbo_log online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none

      yardi_log online          raid_dp, flex     nosnap=on, create_ucode=on

  new_yardi_log online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

      bmbo_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none

vmware_storage_2 online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

    yardi_trans online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

vmware_server_storage_1 online          raid_dp, flex     nosnap=on, create_ucode=on

sharepoint_servers_volume online          raid_dp, flex     nosnap=on, create_ucode=on

fl2000_sql002_data online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

           root online          raid_dp, flex     root, create_ucode=on

    doc1_backup online          raid_dp, flex     nosnap=on, create_ucode=on

new_yardi_trans online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

      quest_sql online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_1 online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_3 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

boot_volume_vmware_2 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

boot_volume_vmware_5 online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_4 online          raid_dp, flex     nosnap=on, create_ucode=on

   dw001_backup online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

fl2000-netapp002>

---