ONTAP Discussions

High latency & queue length on 2050

jimmueller
9,734 Views

We've built a new SQL 2008 server and we are seeing high latency and low throughput to our Netapp 2050 on this server. During the restore of a 250GB database backup file from DAS to SAN, Windows Resource Monitor is showing disk latency >200ms and queue length near 10 on the Netapp LUNs. We are using an IBM x3850 server with IBM's re-branded Emulex 8Gb/s FC card, connected to a Cisco 9120 FC switch (which only supports 2Gbps, but we hope to upgrade those switches soon), which connects to the SAN. On a file copy within the OS, I'm seeing these throughput rates:

Single 30GB file, C/D/E are DAS, R/S/T are SAN.

E: > 😧 200-300MB/s

E: > R: 200-300MB/s

E: > S: 200-300MB/s

E: > T: 200-350MB/s

T: > S: <10MB/s

T: > R: 10-20MB/s

T: > E: 10-20MB/s

T: > 😧 10-20MB/s

The attached file contains sysstat and lun stats. I captured this data during the same db restore. I need help with the commands to troubleshoot the bottleneck and understanding how to resolve it.

Thanks!

Jim

1 ACCEPTED SOLUTION

crocker
9,725 Views

  is this one that your team handles or should the question be asked in the NetApp Support Community?

View solution in original post

20 REPLIES 20

crocker
9,726 Views

  is this one that your team handles or should the question be asked in the NetApp Support Community?

brendanheading
9,637 Views

Jim,

Your filer is running flat out on both CPU and disk I/O. What is the layout of the aggregate and what kind of disks are they ?

Can you run the same sysstat command without the backup/restore running so that we can see the baseline ?

Any deduplication or reallocate jobs running ? (sis status, reallocate status -v)

How full/empty is the containing volume and its containing aggregate (df will show you this) ? If there are no background jobs going, the simultaneous high disk/CPU, with lots of reads despite only small incoming amounts of read activity makes me wonder if WAFL tetris is making the thing busy.

jimmueller
9,636 Views

I tried to grab the info from the filer and your commands, hope this is what you're requesting. We've ordered a replacement NetApp system along with the forklift upgrade but I don't think it's going to arrive before we want to put this server into production. However, it appears the old sql server's data is on the same aggregate as the new server so perhaps only having one of those sql servers being hit hard at once will get us by.

Thanks!

Jim

brendanheading
9,636 Views

Jim,

It's interesting. Your system is very busy even with the backup not running. Some comments.

Since the 2050 can take 20 internal drives I'm assuming you have 20x300GB SAS disks in the main shelf and you've added an FC shelf with 13 FC drives. Is that right or is there more than one extra shelf .. have you got hot spares ?

The filer isn't generating enough IO to max out a 2Gb/sec FC switch. So I'm not sure that bumping the switch will help a lot.

It looks like your load in terms of IOPs is within the capability of the disks you have. The imbalanced RAID groups make me wonder if you gradually added drives to the first aggregate, if so there may be hot spindles and doing a "reallocate" may help; since you're not using snapshots it's an easier decision. I am not sure if imbalanced RAID comes with any problems of its own. Someone else might be able to comment on that.

Can you check how long your dedupe jobs are taking to run ? Your disk reads are generally about 20-30% more than your FC reads, and your writes are substantially higher. Your sis status shows one volume in the "merging" phase, I reckon it's writing out the metadata to merge the duplicate blocks. sis status -l gives a more detailed readout, sis config should show the schedule. If your jobs are taking longer than a couple of hours to do, you might want to try turning this off. This may be the source of the heavy CPU usage.

Given that you're all FC with no NAS going on, it takes a bit more care and thought to get the value out of deduplication; you can only get real savings if you use thin provisioned LUNs/volumes. Given that your user volume is about 1TB and shows at 75% full it looks like you're getting a bit of dedupe but you're not returning that space to the aggregate. One thing you might try is turning dedupe off altogether.

The RAID tetris part depends, I guess, on exactly how full the filesystems within each of your LUNs are.

If you haven't already done so, raising a support ticket with NetApp and sending them a perfstat will almost certainly lead to some good advice. They can take you through checking to see whether or not a reallocate will help, and telling you what exactly the source of the high CPU and poor throughput is.

brendanheading
9,636 Views

BTW Jim in your original post you're saying that copying from a DAS drive to the SAN gets you 200-300MB/sec. Are you sure about that ? It's a hell of a job for 2Gb/sec FC to do 200MB let alone more than that.

jimmueller
9,636 Views

The throughput I mentioned in the original post was the peak range reported by the Windows GUI during the copy, but in hindsight the cache is skewing those numbers. The current utilization on netapp001 this morning is in the 60-70% range. I've re-run the file copies and the results are below. The test file is a 29.9GB SQL log file.

I've included the same commands and the additional commands you requested some of the attached logs. Netapp001_3.txt is without me testing anything, Netapp002_1.txt is during the E>R filecopy, Netapp001_4.txt is during the E>S filecopy, Netapp002_2.txt is during the E>T filecopy, Netapp002_3.txt is post-copy.

E>R : 7m56s to copy, averaging 64MB/s. The copy indicated a very fast start and then it dropped down to the 60MB/s range.

E>S: 12m3s to copy, averaging 42MB/s. Same thing, fast start, drops to 40MB/s range.

E>T: 7m27s to copy, averaging 68MB/s. Same, drops to 60MB/s range

brendanheading
9,636 Views

Jim, (reposting with the right account!)

Yeah, I had thought that caching on the client side might make your IO look fast, but given that it was a 30GB file I thought it would be much too large to be buffered all at once and that the real write speed would have been noticeable quite early on in the copy.

Now that we have eliminated deduplication and reallocation from the enquiry it is a case of identifying which of the volumes is causing all the activity.

The 001_3 log shows that the CPU is reasonably busy, hovering around the 70% mark on average, which to me is high. There is continuous read activity of around 40MB/sec. That read activity seems to be a constant theme through all of your logs. I notice in this log that the FCP out is roughly consistent with the disk read so this is data being pulled straight off the disk and being read out without much extra background stuff happening.

Looking at the corresponding LUN stats, it looks like roughly 20-30MB of the activity is consistently down to your user_data and kronos_servers LUNs.

As it stands I think you are reaching the upper limit of what the filer can do. My guess is that your next steps are :

- look into what the busy LUNs are actually doing. Is someone running a spurious process, backup/copy operation or something else which is constantly/unnecessarily reading ?

- on the filer side, reallocate may help you a lot. See https://communities.netapp.com/docs/DOC-4977. Try the "measure" operation for your volumes and report back on how you get on. (you might want to do this outside of business hours). In addition, you may want to look at enabling the "read_reallocate" flag on your volumes. This causes the filer to restripe the busy sections of a LUN so that they can be read quickly. Seehttps://communities.netapp.com/docs/DOC-6265.

Google for the read_reallocate and reallocate command and you can find out a bit more. There's a good chance that your LUNs will benefit.

Regarding drives .. the 2050 has a front bezel which you can remove, you'll see all the drives and their activity LEDs behind this. The light on the far right is the controller's activity light.

It sounds like a lot of the shelves you have are not connected to this controller - do you have another controller handy ? To find out that information, try sysconfig -A -r -d . This'll dump out a lot of information about your controllers, shelves, disks and so on. "vol status" will tell us what way your volumes are setup which might also be useful. Have you checked the cabling around the back to make sure the shelves are all connected back to the controller correctly ? (note that you'll probably need downtime if you find you need to make any changes here).

I'm getting the sense that the guy who set this up is no longer around and you have been tasked with figuring it out, yes ?

jimmueller
9,636 Views

I did remove the 2050 bezel, but still the only light activity I saw was on the controller light. The guy who set it up under NetApp's guidance is still here but he's been busy with other tasks and I need to learn more about it anyway so I can cover when he's out. I haven't checked the cabling. Our system doesn't like those sysconfig parameters...

---

fl2000-netapp001> sysconfig -A -r -d

The -c, -d, -m, -r, -t, and -V options are mutually exclusive.

usage: sysconfig [ -A | -c | -d | -h | -m | -r | -t | -V ]

       sysconfig [ -av ] [ <slot> ]

fl2000-netapp001> vol status

         Volume State           Status            Options

    esx_upgrade online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

new_yardi_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

vmware_datastore_2 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

fl2000_vmware_vol1 online          raid_dp, flex     nosnap=on, create_ucode=on

           root online          raid_dp, flex     root, create_ucode=on

fl2000_sql002_log online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

kronos_servers online          raid_dp, flex     nosnap=on, create_ucode=on

     yardi_data online          raid_dp, flex     nosnap=on, create_ucode=on

                                sis

     vtl_backup online          raid_dp, flex     nosnap=on, create_ucode=on

      user_data online          raid_dp, flex     nosnap=on, create_ucode=on

                                sis

fl2000-netapp001>

---

fl2000-netapp002> sysconfig -A -r -d

The -c, -d, -m, -r, -t, and -V options are mutually exclusive.

usage: sysconfig [ -A | -c | -d | -h | -m | -r | -t | -V ]

       sysconfig [ -av ] [ <slot> ]

fl2000-netapp002> vol status

         Volume State           Status            Options

      doc1_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                sis               guarantee=none

fl2000_esx002_volume_1 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

       bmbo_log online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none

      yardi_log online          raid_dp, flex     nosnap=on, create_ucode=on

  new_yardi_log online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

      bmbo_data online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none

vmware_storage_2 online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

    yardi_trans online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

vmware_server_storage_1 online          raid_dp, flex     nosnap=on, create_ucode=on

sharepoint_servers_volume online          raid_dp, flex     nosnap=on, create_ucode=on

fl2000_sql002_data online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

           root online          raid_dp, flex     root, create_ucode=on

    doc1_backup online          raid_dp, flex     nosnap=on, create_ucode=on

new_yardi_trans online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

      quest_sql online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_1 online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_3 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

boot_volume_vmware_2 online          raid_dp, flex     nosnap=on, nosnapdir=on,

                                                  create_ucode=on

boot_volume_vmware_5 online          raid_dp, flex     nosnap=on, create_ucode=on

boot_volume_vmware_4 online          raid_dp, flex     nosnap=on, create_ucode=on

   dw001_backup online          raid_dp, flex     nosnap=on, create_ucode=on,

                                                  guarantee=none,

                                                  fractional_reserve=0

fl2000-netapp002>

---

brendanheading
7,940 Views

Jim,

OK, it could be that the 2050 was shipped without internal disks and you've just added the shelves; or that the 2050 drives haven't been provisioned, which would be odd.

Try the sysconfig with one parameter at a time eg

sysconfig -A

sysconfig -r

sysconfig -d

jimmueller
7,940 Views

See attached...wait, let me clean that up

brendanheading
7,940 Views

Jim,

I'm not persuaded that there is much more that can be done here with your actual problem outside of the reallocate suggestion I mentioned above, so at this stage we're just understanding your configuration a bit better. The other option is to raise a support request with NetApp as they'll be able to analyze the state of the whole system and give some useful suggestions, rather than this rather piecemeal approach we're doing here.

OK so crunching that we can see that :

- you have two controllers ie you are running active-active.

- on the 001 you have one 20-disk aggregate (4 parity, 16 data) of 15KRPM FCAL drives, and one 13-disk aggregate of 1TB SATA drives (2 parity, 11 data). You have two hot FCAL spares and one hot SATA spare.

- on 002 you have a 24-disk aggregate (4 parity, 20 data) and a 23-disk aggregate (4 parity, 19 data) and one hot spare.

That accounts for your 70 FCAL and 14 SATA and confirms that there are no disks installed in the main unit. There are a few very minor niggles, best practice is that there are two hot spares of each type of disk used in a given controller, but it's not strictly essential. You might want to run "disk zero spares" on the 001 controller to zero out the SATA spare. It'll make things a bit quicker when a rebuild is required.

This is really just reinforcing my view that your problem probably isn't to do with your disks being maxed out; using the usual rule of thumb for working out IOPs, across your system your FCAL disks should give you a total capability in the region of around 10,000 IOPs and you're not pushing anything like that. Your SATA aggregate on the other hand will give less than 10% of that.

"aggr status -v" will list the volumes contained in each aggregate, so we can check that to see if any of the stuff you've been testing has been going to the slower SATA drives.

jimmueller
7,940 Views

So the physical system can support more IOPS but we can't isolate the reason for the high utilization. We've already talked to NetApp and their answer was to add more drives. Because the system is coming off lease we're not adding more drives to it. Below is the core of the replacement system:

(2) 24x2TB 7200 shelves

(2) 24x600GB 15K shelves

(2) FAS3240

We're getting rid of the existing SATA shelves and replacing it with the SATA shelves above, and we're keeping two existing FC shelves.

fl2000-netapp001> aggr status -v
           Aggr State           Status            Options
  aggr1_fc1_001 online          raid_dp, aggr     nosnap=off, raidtype=raid_dp,
                                                  raidsize=16,
                                                  ignore_inconsistent=off,
                                                  snapmirrored=off,
                                                  resyncsnaptime=60,
                                                  fs_size_fixed=off,
                                                  snapshot_autodelete=on,
                                                  lost_write_protect=on

                Volumes: yardi_data, fl2000_vmware_vol1, fl2000_sql002_log,
                         vmware_datastore_2, esx_upgrade, kronos_servers,
                         new_yardi_data

                Plex /aggr1_fc1_001/plex0: online, normal, active
                    RAID group /aggr1_fc1_001/plex0/rg0: normal
                    RAID group /aggr1_fc1_001/plex0/rg1: normal

aggr1_sata1_001 online          raid_dp, aggr     root, diskroot, nosnap=off,
                                                  raidtype=raid_dp, raidsize=14,
                                                  ignore_inconsistent=off,
                                                  snapmirrored=off,
                                                  resyncsnaptime=60,
                                                  fs_size_fixed=off,
                                                  snapshot_autodelete=on,
                                                  lost_write_protect=on

                Volumes: root, vtl_backup, user_data

                Plex /aggr1_sata1_001/plex0: online, normal, active
                    RAID group /aggr1_sata1_001/plex0/rg0: normal

fl2000-netapp002> aggr status -v
           Aggr State           Status            Options
  aggr1_fc1_002 online          raid_dp, aggr     root, diskroot, nosnap=off,
                                                  raidtype=raid_dp, raidsize=16,
                                                  ignore_inconsistent=off,
                                                  snapmirrored=off,
                                                  resyncsnaptime=60,
                                                  fs_size_fixed=off,
                                                  snapshot_autodelete=on,
                                                  lost_write_protect=on

                Volumes: root, yardi_log, doc1_data, boot_volume_vmware_1,
                         boot_volume_vmware_2, fl2000_esx002_volume_1,
                         fl2000_sql002_data, sharepoint_servers_volume,
                         boot_volume_vmware_3, boot_volume_vmware_4

                Plex /aggr1_fc1_002/plex0: online, normal, active
                    RAID group /aggr1_fc1_002/plex0/rg0: normal
                    RAID group /aggr1_fc1_002/plex0/rg1: normal

  aggr2_fc1_002 online          raid_dp, aggr     nosnap=off, raidtype=raid_dp,
                                                  raidsize=16,
                                                  ignore_inconsistent=off,
                                                  snapmirrored=off,
                                                  resyncsnaptime=60,
                                                  fs_size_fixed=off,
                                                  snapshot_autodelete=on,
                                                  lost_write_protect=on

                Volumes: vmware_server_storage_1, doc1_backup, bmbo_data,
                         bmbo_log, vmware_storage_2, quest_sql,
                         boot_volume_vmware_5, yardi_trans, dw001_backup,
                         new_yardi_log, new_yardi_trans

                Plex /aggr2_fc1_002/plex0: online, normal, active
                    RAID group /aggr2_fc1_002/plex0/rg0: normal
                    RAID group /aggr2_fc1_002/plex0/rg1: normal

fl2000-netapp002>

brendanheading
7,940 Views

Jim,

I'll look at the aggregate output in a sec, but the 3240 is a rather dramatic uplift in processing power and throughput from the 2050 - which is five years old - and it should cope with that workload with one hand tied behind its back. It's not even in the same league really.

I'm going to go out on a limb here and say I don't think NetApp have met expectations based on what you are saying there. Do you think they investigated this matter fully - did they not even suggest a reallocate ? Did they at least get a perfstat from you ? I'm wondering if they knew you were about to come off lease and left it that way.

jimmueller
7,940 Views

To be quite honest I've never been impressed with the current NetApp solution; I was happy with the prior Hitachi SAN and I hope the 3240 is everything and more for which we're paying. The other guy who normally deals with this usually opens the NetApp cases, I'd have to get that extra info from him. But I infer that typically we find NetApp's support just average. NetApp wasn't our teams chosen solution for the new lease, I think it was the CIO who made the comparison with the other vendor and made the final call. I'm sure it'll be fast enough, but I find the Filer and console management to be a PITA.

brendanheading
6,524 Views

Jim,

Sorry to hear that. I think the NetApp solution is better than anything that Hitachi do, but aside from falling short on support (which is the principal reason for spending so much on this kind of enterprise kit), I think whoever your local NetApp reps are have been less than diligent about taking you through the feature set. That's a shame. I think you should try to get someone to complain to your reseller about this matter and get them to put it right. It's a pity this didn't come up before the 3240 deal was sealed as you would probably have found it easier to get traction in getting issues sorted. If you don't get anywhere with the reseller, see if you can call up NetApp and get hold of the regional sales manager for your part of the world.

The solution really comes into its own if you use it to its full, and in your case you are really using only a small part of the feature set. In your case, for example, it looks like you are hosting home directories over a SAN to a file server, where you could be hosting them on right off the NetApp via NAS. You've obviously got VMWare stuff going - there are a whole bunch of tools that make VMWare on NetApp a breeze to use (especially over NFS), there is no sign they are in use - likewise SQL Server. Snapshots, to me, are the real killer app on NetApp and you don't have any of those enabled. Deduplication has been enabled but it looks like it has not been set up in a way that lets you get any benefit.

Have you tried using System Manager for managing it ? FilerView is pretty primitive and it has been removed on the latest releases of the OS for that reason.

jimmueller
6,524 Views

There are a number of features we didn't use in the 2050 purchase because of their cost. It's my understanding that they changed their feature licensing since we purchased the 2050 and now all features are included. I have not seen System Manager. On a side note, we had a controller failure last night. Because we are buying a new system, we didn't renew the maintenance on the old system. Now we have to call tech support, give them the old serial # and new order # to get a case #, then we have to send that to our sales rep so he can get pull strings to get the old equipment covered. Glad they're still finding a way to cover it but it seems a little like the scenic route.

radek_kubka
6,526 Views

System Manager is free of charge, regardless of your licensing.

brendanheading
6,524 Views

Jim,

That's a sad case of Murphy's law. It's just not a good idea to run a system (of any kind) without support unless you can deal with the consequences. I wouldn't say that NetApp could reasonably be expected to help in this case, they have probably opted to do so because you put some business their way recently.

Like I said above, I don't think you've had the best experience. Someone should have taken you through all the options and explained the pros and cons, and shown you all the tools and possibilities when the deal was originally being worked out. You might want to consider having an expert come in and take your team through the new filer when it is installed.

jimmueller
5,759 Views

I'm told by our IT Director that we were prepared to pay for 3 months of support on the old system, but NetApp recommended doing it this way instead. We had a NetApp engineer come out to install the original 2050, and I think for every additional tray as well. The new purchase includes their upgrade & head-swap services. One of the four HBA ports on the controller failed, we moved it to another port.

I guess there's nothing further to discuss here. I really appreciate all the feedback you've given!

Jim

jimmueller
9,636 Views

I'm not sure how to determine whether we have hot spares. Maybe I found it in Filerview under Storage>Disks>Manage>View Spare Disks? It shows a spare in shelf 2, bay 4, chan fc:a, 300GB, another spare on same shelf bay 13, and a third spare on shelf 1, bay 13, fc:a, 450GB.

The 2050 appears to have 20 drives in it, but while the light on the far right is blinking green, there doesn't appear to be any individual drive activity lights. In addition, we have three shelves of 14x300F, two shelves of 14x450A, and one shelf of 14x500A. I was told this morning that the order for new NetApp equipment was sent in last night. We already received a pair of new Brocade 8Gbps x24 FC switches.

Public