Re: reallocate TR? - Page 2

danpancamo · ‎2009-12-19

After fighting with our DBAs for months now about IO performance, we finally narrowed down a growing performance issue to disk fragmentation. We accidentally discovered that a copy of a database was 2x faster to backup than the original copy which lead us to the conclusion that we did actually have an issue with IO.

We ran a reallocate measure which resulted in a threshold of 3. I'm not sure exactly how this is calculated, but obviously the result is misleading. The results should have been 10 IMHO.

We ran reallocate start -f -p on the volume and it immediately reduced the backup time in half. Disk util, disk caching, latency all were significantly better after the reallocate completed.

It appears that a the Sybase full reindex basically tries to optimize by re-writing data... This process somehow causes the disk fragmentation.

I've only been able to find limited information on the reallocate command, however with this significant performance increase, there should be a whitepaper on the subject that includes the effects of reallocate on Cache, PAM, Dedup, Vmware, Databases, Exchange, etc...

Is there a TR document in the works on reallocate? If not, someone should start one.

erick_moore · ‎2010-04-01

Don't confuse aggregate reallocation vs volume level. This should help explain it better at a high level: http://www.theselights.com/2010/03/understanding-netapp-volume-and.html

As for the NetApp not being able to handle writes, actually it is basically a write optimized SAN. You will probably get better write perfomance on a NetApp system then you will any other SAN. One of the biggest problems they faced was sequential read after random write, but as of OnTap 7.3 they added the read_realloc option to volumes which will sequentially reallocate data after it has been read once. The best way to check the performance issues with such a heavy write workload is to do a: sysstat -x -s -c 60 1

Look at the column CP ty. I am curious to see if you are experiencing any back-to-back CP's (B) or deffered back-to-back CP's (b).

jasonczerak · ‎2010-04-01

Right now writes are not very back-to-back or defered, we tuned our apps (and users). Since Flex share is a piece of crap and fails, we have to manualy handle things more then we figured we would

I'll see where I can induce sio load on the 3160 cluster that's not in prod yet and get it close to what I've seen on the 6080.

jeremypage · ‎2010-04-01

I'm not confusing the different allocation methods, I am just trying to dispel some of the misinformation posted previously in this thread. I verified what I posted with the people who write those portions of Ontap so I'm reasonably certain it's accurate. In most cases an aggregate level reallocate does not do any good but when adding new disks to an existing RG it can be worth while. Ontap will eventually spread the data across all the disks anyways so it may not be worth the resources to run it, YMMV.

In short:

Running an aggregate level reallocate will make the filer attempt to make the blocks on disk contigious. That is all, it knows nothing about file systems so this is not going to give any benefits to sequential reads (unless the reduction in seek time makes a difference). It is supposed to spread the exiting data across all the disks in the RG's that belong to the aggr, allowing your reads to be done against more spindles.

Running a volume level reallocate will try and make the file systems contiguous - this is different than above since it actually should make you have more sequential reads and is far more likely to improve performance.

erick_moore · ‎2010-04-01

Jeremy, I hate to say it, but you are incorrect about the aggregate rellocate, please read the link I posted.

You stated, "In most cases an aggregate level reallocate does not do any good but when adding new disks to an existing RG it can be worth while. "

This is from the NetApp manual reagarding aggregate reallocation: "Do not use -A after growing an aggregate if you wish to optimize the layout of existing data; instead use `reallocate start -f /vol/<volname>' for each volume in the aggregate."

Doing an aggregate reallocate when you grow an aggregate will not spread existing data across the new disk. On the HP EVA there is a process called "leveling". This is basically what a volume reallocate does, it spreads the data out across all the spindles in the aggregate. I think NetApp needs to change the terminology for these similar but different processes. Perhaps aggregate reallocate should be "reallocate", and volume reallocate should be "redistribute".

jeremypage · ‎2010-04-01

I think you're confusing optimizing the file system with spreading it across spindles. Aggregates can't optimize a filesystem since there is no concept of a filesystem at the aggregate level. That does not mean it can't help things run faster under certain conditions.

Having power to my filer does not optimize my filesystems either but it makes it run a heck of a lot better.

erick_moore · ‎2010-04-01

OK, maybe we are saying the same thing, but I like you want to clear up any mis-information that is lingering in this post. You do not run an aggregate reallocate after growing an aggregate. That will not gain you anything, and it says as much in the manual. That is not me saying it, that is NetApp. If you are having performance issues with something like an SQL LUN, you would start by checking some LUN stats:

lun stats -o -i 1 /vol/volname/lunname

Checking the reallocation level of the volume or the file in the volume (a LUN as it were in this case)

reallocate measure /vol/volname/lunname

If you want to optimize performance to that LUN you would then run a reallocate against it:

reallocate start -f -p /vol/volname/lunname

Additionally you may want to setup scheduled reallocation jobs, with a threshold setting (3 in this case) to run during off-hours, like every Saturday night at 23:00:

reallocate start -t 3 -p /vol/volname/lunname

reallocate schedule -s "0 23 * 6"

Best Regards,

Erick

jeremypage · ‎2010-04-01

I certainly agree with all of those points. In addition to be clear it's not a benefit at the aggr level when you do the -A as much as at the RAID group level because really that's where spindle count comes in, which is the only thing -A should effect (well, with the possibility of seek time but I think that's a minimal impact and not an issue for most people - and if it is you probably should not be using a NetApp).

aborzenkov · ‎2010-04-02

You do not run an aggregate reallocate after growing an aggregate. That will not gain you anything, and it says as much in the manual. That is not me saying it, that is NetApp.

E-h-h - no, it is not what NetApp is saying, it is how you read it NetApp says: Do not use -A after growing an aggregate if you wish to optimize the layout of existing data. But that is exactly what Jeremy was telling you all the time. Aggregate reallocation won't improve layout of data - but it will improve distribution of data over disks.

If you are having performance issues

I would stop here and ask - which performance issues? Performance is not equal performance. I have customers who never run reallocate and are quite happy - for their specific workload.

erick_moore · ‎2010-04-02

Yowsa this topic is crazy. More proof that there needs to be a TR. Phrasing is very important when talking about this topic. I think Jeremy and I are on the same page, and I shouldn't have said it won't do anything if you add new disks to an aggregate. The fact is, it won't do anything to the new disks you added, but it will make free space contiguous in the existing disk. Since the newly added disks already have nothing but free space there is nothing for a reallocate -A to do on those spindles.

radek_kubka · ‎2010-04-02

Guys,

This thread is absolutely priceless - it is close to become a TR!

Re expanding aggregates:

If you haven't done it yet, have a read of a very interesting blog post from Chris (plus all the comments)

http://communities.netapp.com/groups/chris-kranz-hardware-pro/blog/2010/03/11/hot-spindles

Regards,

Radek

joebutchinski · ‎2010-06-07

Erick Moore wrote:

Additionally you may want to setup scheduled reallocation jobs, with a threshold setting (3 in this case) to run during off-hours, like every Saturday night at 23:00:

reallocate start -t 3 -p /vol/volname/lunname

reallocate schedule -s "0 23 * 6"

This caught my attention. The manpage says "Reallocation processing operates as a background task." so I've always scheduled with the assumption that file service would trump the reallocate. I wonder if anyone has observed a negative impact on performance during reallocation. I haven't formally tested this but have received no complaints about performance during a reallocate.

jasonczerak · ‎2010-06-07

It causes all kinda of latency issues on a 6080 cluster on 7.3.1.1L1P2 if it's doing more then one volume at a time ON THE FILER, not just per aggr.

We'll be bumping to 7.3.3P-something this weekend. I might kick off a few and see what happens

igor · ‎2011-01-10

Hello Eric,

I ran the commands as you suggested against one of the LUNs here, by settings the threshold to 4 and establishing a twice-a-week schedule - at 11PM on Sundays and Wednesdays:

reallocate start -t 4 -p /vol/TEST/test.lun
reallocate schedule -s "0 23 * 3,0" /vol/TEST/test.lun

I had expected the reallocation (optimization) process to commence automatically once the threshold is reached, but it doesn't. I only keep getting system messages in my Autosupport, advising me to run reallocate:

Wed Dec 29 23:00:00 CET [wafl.scan.start:info]: Starting WAFL layout measurement on volume TEST.
Wed Dec 29 23:10:19 CET [wafl.reallocate.check.highAdvise:info]: Allocation check on '/vol/TEST/test.lun' is 4, hotspot 19 (threshold 4), consider running reallocate.

Sun Jan  2 23:00:00 CET [wafl.scan.start:info]: Starting WAFL layout measurement on volume TEST.
Sun Jan  2 23:10:16 CET [wafl.reallocate.check.highAdvise:info]: Allocation check on '/vol/TEST/test.lun' is 5, hotspot 19 (threshold 4), consider running reallocate.

Surely this should've been done automatically by now?

Cheers,

Igor

dnewtontmw · ‎2010-03-30

Any updated documentation or thinking on this topic?

We're running SQL Server 2005 on a NetApp FAS3160 server. We do SQL index rebuilds on the weekends, and I wonder if it's same as Sybase under the covers, with regards to how it behaves at the storage level...

BrendonHiggins · ‎2010-04-06

As part of this SQL lun latency issue http://communities.netapp.com/thread/7456 I will be running the volume reallocate against the lun tomorrow night. If will post back the result of the work at the end of the week. Should be a good test of weather or not it works as described.

Bren

BrendonHiggins · ‎2010-04-08

Long story short, looks like a 20% improvement in LUN latency times for FREE! Full details on other post.

http://communities.netapp.com/thread/7456 I

Bren

radek_kubka · ‎2010-04-08

Hi Bren,

I'm loving your attitude - some people (occasionally) get angry coz their LUNs are fragmented & perform badly, whilst you are looking at the bright side!

"20% latency improvement for free" is the best reallocate description I've ever heard

Cheers,

Radek

lwei · ‎2010-04-08

Hi Bren,

Thank you for the posting. I'm glad it helped in your environment.

Regards,

Wei

amiller_1 · ‎2010-04-18

I just wanted to chime in (after a long hiatus) to say that I would REALLY love to see a TR on this. I'm currently working through reallocate questions for multiple customers with multiple scenarios (straight-up reallocate, reallocate after adding 1-2 disks to an aggr (disks were waiting until the 7.3 upgrade allowed a bit larger aggrs), dedup and reallocate, reallocate and VMware, etc.).

erick_moore · ‎2010-06-07

1-2 disk add on an aggregate will require a reallocate on every volume in that aggregate. NetApp PS recommends never adding less than 4 disk at a time to an aggregate, but depending on the rate of change even that could be too low for some workloads. Also dedup blocks will not ever get reallocated.

jasonczerak · ‎2010-06-07

We took 2 64disk aggr's and added an entire RG of 16 disks to each. 6TB per aggr took nearly a week to reallocated, manually one volume at a time as to not impact anything else. on a 6080.

It's a good idea, it's just badly implamneted. like scrubbing, wouldn't it make sense to use idle-ish IO to keep data optimized?