Re: reallocate TR?

danpancamo · ‎2009-12-19

After fighting with our DBAs for months now about IO performance, we finally narrowed down a growing performance issue to disk fragmentation. We accidentally discovered that a copy of a database was 2x faster to backup than the original copy which lead us to the conclusion that we did actually have an issue with IO.

We ran a reallocate measure which resulted in a threshold of 3. I'm not sure exactly how this is calculated, but obviously the result is misleading. The results should have been 10 IMHO.

We ran reallocate start -f -p on the volume and it immediately reduced the backup time in half. Disk util, disk caching, latency all were significantly better after the reallocate completed.

It appears that a the Sybase full reindex basically tries to optimize by re-writing data... This process somehow causes the disk fragmentation.

I've only been able to find limited information on the reallocate command, however with this significant performance increase, there should be a whitepaper on the subject that includes the effects of reallocate on Cache, PAM, Dedup, Vmware, Databases, Exchange, etc...

Is there a TR document in the works on reallocate? If not, someone should start one.

__frostbyte_9045 · ‎2010-01-15

I'll second the need for further documentation. As part of my PM work (things are slow right now) I found that some of my volumes came back with 6 and 7's. However, the documentatioin does seem to be very light! I've been playing around but don't know if is really helping since we did't do any benchmarking prior to reallocate being run.

BrendonHiggins · ‎2010-01-16

Hi

I am just posting as I would be keen to know more about reallocate. I have used the command a couple of times in the past and have had issues due to aggregates being create 7.1 and the command being used on 7.3 filers.

How did you "we finally narrowed down a growing performance issue to disk fragmentation". Are you using statit and looking at chain lengths and RAID stats?

Thanks

Bren

jasonczerak · ‎2010-02-03

I'd have to agree. I'm looking into performance tuning now that our main business application will be moving to RAC and NetApp this Summer. Highly transactional kind of stuff.

jeremypage · ‎2010-03-04

Hey NetApp, please document this better - most of your customers are not willing to read these boards to find stuff like this and the docs in the 7.3 manual are very sparse. Would be nice to have a decent scheduling system set up too.

And the same thing was true for our system, reallocate made a huge difference in sequential read type stuff.

jeremypage · ‎2010-03-04

In fact I'd be happy just to know if aggr reallocates take care of everything under them. I can handle one large flood of snapshots if I'm ready for them but I'd prefer not to get ready if it's not worth my time...

aborzenkov · ‎2010-04-01

In fact I'd be happy just to know if aggr reallocates take care of everything under them. I can handle one large flood of snapshots if I'm ready for them but I'd prefer not to get ready if it's not worth my time...

According to official NetApp manuals aggregate reallocation does not optimize file layout (which is logical when you think about it - aggregate does not know anything about files that are too far above). It compacts used blocks to create more contiguous free space.

So aggregate reallocation may help with disk writes, but it shouldn't have any effect on large sequential disk reads.

jasonczerak · ‎2010-04-01

We all know NetApp filers need massive help with writes

I just have a problem with the resources the reallocate -A has affects. 7.3.3 supposes to help make it better.. 8.0 completly solve it.

jeremypage · ‎2010-04-01

Not sure what you mean by that, what problems do you have with writes? Sized properly the NVRAM should be handling most of that load.

jasonczerak · ‎2010-04-01

When ever large write workloads are kicked off, say, bulding up temp table space, table splits, or file copies. Once the write thoughput reached 200MB/sec we start to see some increase read latency, once it's at 250MB/sec the NVRAM can not keep up, even on disks that are not utilizied (under 10% IO and space utiliziation). Filer wide latency is increased. This is on a 6080 filer 7.3.1.1. average thoughput 9-5 is 400MB/sec on each 6080 node in the cluster. at times we push well over 500. 50-75MB/sec average write work load. Write workloads just kill things when pushed.

We've worked to limit write's to off hours and what not so it's not a big deal.

jeremypage · ‎2010-04-01

I gotcha, I think OnTap is probably tuned to expect the NVRAM to keep up with the writes and it sounds like you're going well beyond it's ability. Maybe you can get some inside-out PAM cards

We're super (read "filer used as RAM because the DBA has no clue") read intensive here so I don't see that problem. Our biggest single system is an Oracle 10g DB that can peak in the 200mBs range but usually is between 100 and 85 - but 98% of that is reads and 90% of those are being serviced by the cache. Sad part is the AIX host that Oracle is running on has at least 10 gig of free memory

jasonczerak · ‎2010-04-01

We migrated from HP + oracle 9 to Linux + oracle 10g + RAC + NFS + 10Ge. New to NetApp at the same time. After a year we started to explore some tuning. just doubling some SGA or what ever IO usage drop 50% on the filer side. The DBA's were new to RAC and used "monolithic tuning" on RAC at first. It was safe call at first. Plus the new env was 150% faster (before the memory changes) then the old so there wasn't any more call to tune.

We'll be doing some more tuning linux side and netapp side this summer if we can find some time.

jeremypage · ‎2010-04-01

I'd kill to get rid of our AIX+10g or at least move it to NFS. Right now the DBAs are terrified of IP storage (10 gigE), it's so slow compared to 2gig FC...

Nevermind that they don't know what a zone or an MTU size is, they just know it's slower. Fibre is a pain in the butt.

jasonczerak · ‎2010-04-01

Tell the DBA's you'll handle the infrastructure. Put your foot down! LOL

If they had a clue, they'd know that the majority of Oracle's own DB servers at oracle run over nfs.

A friend of mine needed some help to sell NFS to a client. The guy was scared of corrupting data when packets would go missing and stale file handles. Yeah, if you used UDP and NFS version 1. Sure. Not the case. He even suggested trying out FCoE, WTF? why? What a useless stop gap idea.

You have to tune your oracle to use bigger blocks, bump up the MTU. Save costs on infrastrure and win on flexibility.Why wouldn't you go NFS?

Right now all we have on FC is exchange on NetApp (Windows08 wouldn't do iSCSI luns and support exchange 2010 when we deployed it) And some old oracle DB's that were on aging EMC disk, soon to be moved to RAC.

__frostbyte_9045 · ‎2010-04-01

Snip <Why wouldn't you go NFS?> snip

We are using FC becuase I can by 3 whole sets of 8GB Fiber switches for what NetApp wants to charge for the NFS license for our 3140 cluster. Not to mention the cost of 10gE switch ports. Plus, being a SQL shop <not virtualized> we could only benifit from NFS on or vSphere infrastructure <everything but SQL>.

Also, I've enjoied this discussion. It has provided some interesting insights as to the odd and undocumented aspects of WAFL.

jasonczerak · ‎2010-04-01

Folks are starting to virutalize SQL these days. It's on the drawing board over here.

Yeah, the NFS license is insane. just about any software licience is 40k.

__frostbyte_9045 · ‎2010-04-01

We would discuss it, if the Processor licensing model change to per physical and not to per vCPU. Because of the way it is licensed, we run a two node active/active SQL cluster. Isolation is faciliated by having multiple SQL instances, which works out to a manual load balancing act, much like NetApp's active/active cluster

jeremypage · ‎2010-04-01

Our main SQL 2008 server and our DWH are both VMs, albeit larger than most of them. They are relatively low utilization though, the SQL averages in the 2k IOPS range and the DWH bounces up and down but never exceeds 4k (which is spindle limited but I am guessing will shoot through the roof when we get our PAMII cards).

I did a very non-scientific test with an intel x25 (in poor hardware, it was obviously bottlenecked at the controller) where I ran SQLIO against it under Win2008R2 x64 and averages right at 2800 IOPS across 5 tests. Then exported it via NFS and ran the same thing on a VM over my storage IP network. A single 1g from the workstation and then 2 10g to the filer. Just over 2700 IOPS after 5 tests.

I *just* got a better system set up today to test with, I'll report back on the results but with paravirtual devices and a well tuned VM you can certainly have a decent sized database running in a VM over NFS with minimal loss in efficiency. And you get all the goodies like snapshots and replication etc. Good stuff!

dnewtontmw · ‎2010-04-01

FWIW, in our environment (3160, FC 15K, 28 spindles) to our DW, I ran SQLIO tests recently -- making the tests long/large enough to saturate the cache -- and we're seeing 1800-4800 IOPs, depending upon the test.

amiller_1 · ‎2010-04-18

For what it's worth, Bundles are great here -- ask your NetApp rep about them next time you're looking at an upgrade and/or new system. Basically if you're doing a couple key pieces of software you're now into Bundle territory where you get most stuff included (there's a couple different bundles....my favorite being the Complete Bundle...just as I can then know that the customer has every possible piece of software so I can purely talk technical architecture rather than getting side-tracked on pricing).

jeremypage · ‎2010-04-01

I'm not the greatest storage administrator out there but I did work at NetApp for a few years and still have contact there. Although the aggregate level reallocate does not explicitly give you better read performance it can if you've added new disks (which is what I said earlier) because it DOES move data more or less evenly across them. So instead of reading only from the old spindles it will now be able to pull data from the newly added ones as well.

I have not verified this first hand with testing but it makes sense. In addition it probably reduces seek times depending on how full your aggregates are simply because the heads don't have to travel as far to reach the next block, although that's purely speculation & I am not sure there would be a measurable difference there.

As far as MSSQL, are you running it on a VM or a LUN? Is it deduped or not? If you're running it on a non-ASIS LUN I'd do a reallocate measure and see, a volume level reallocate made a substantial difference on our Oracle LUNs.