ONTAP Discussions

Window NTFS fragmentation + File system full degrade performance?

richardl
10,431 Views

Hi,

IHAC current using NetApp FC SAN, connect to window 2003, they are running backup to disk on top of the LUN (NTFS format).

After running for a while, the FC LUN performance (write) drop sigificantly (from close to 100MB/s to around 40MB/s). A closer look finds that the LUN is highly fragmented and due to the staging unit design it is always running with 90% file system full.

Our competitor is educating them this behaviour appear at Netapp FC SAN only.

I am urgently looking for any third party documentation that support the theory a highly fragmented NTFS LUN, together with file system close to or above 90% full, will introduce sigificant write performance degradation.  Therefore the above issue will appear regardless of what SAN vendor they are using.

Any input is highly appreicated!!!

Thanks.

Richard

10 REPLIES 10

chriskranz
10,377 Views

Is the competitor HP by any chance?

I'm not sure if I have any supporting evidence to prove the point, but it is simple storage economics. RAID algorithms always attempt to ground writes together to improve disk layour and read performance. The more free space you have, then better write and read performance is. The less space you have, the smaller RAID stripes can be made in succession and this causes read performance to be affected. This is a global challenge of any RAID system really.

Having said that, the NetApp storage system utilizes free space in the entire aggregate. So although it is best practice to keep the volumes with a fair proportion of free space, so long as your aggregate has adequate space, the NetApp storage should not be the bottleneck or cause for fragmentation.

It is much more likely that Windows is causing this issue. Windows never really deletes data blocks when you remove data, so the LUN is constantly filling to 100% before data is getting purged by Windows. This would cause Windows itself to fragment the data significantly.

If large amount of data is being written and removed from the Windows file system, you can take several steps using NetApp tools and technology to improve this. You could schedule a regular reallocation scan from the NetApp side to greatly improve the performance of this LUN. In ONTAP 7.3 this is completely transparent from any snapshots, so this would definitely be a recommended step. http://now.netapp.com/NOW/knowledge/docs/ontap/rel7311/html/ontap/sysadmin/tuning/task/t_oc_tun_reallocate-creating-scan-schedule.html

If you are using SnapDrive, you could also look at scheduling a space_reclamation more regularly. This allows the blocks that Windows doesn't delete to be freed up properly and allow new writes to be laid out better. If you have a scheduled batch job, you'd want to time the space_reclaimer before this happens.

I'll never tire of hearing "WAFL degrades performance over time", and I've yet to see it actually proved when you follow the best practices. These best practices aren't too tricky to deploy or even restrictive on storage utilisation. Let me know if you need any more information on this, and I'd be interested to know the results.

radek_kubka
10,377 Views

Hi,

As much as I love NetApp, I hate this some kind of a pact of silence around fragmentation issues in LUN environment.

It is a problem, and yes, it can & should be cured via regular reallocating. IMHO messaging around this is far too weak & many people are getting stung by it due to a lack of a properly advertised, decent & clear information.

Re ballooning snapshots due to reallocate being run - I believe the only way to dodge this is to use physical reallocation:

http://now.netapp.com/NOW/knowledge/docs/ontap/rel731/html/ontap/sysadmin/tuning/concept/c_oc_tun_reallocate-reasons-physical-scans.html#c_oc_tun_real...

Regards,

Radek

amiller_1
10,377 Views

So....I'm a bit more mixed here. I ran a SAN-only FAS3050 for years (4+ to be precise before we added in NFS in addition) and never had performance problems although we ran volumes pretty full.

As a partner engineer now, I like knowing about reallocate as an option (and like to use Performance Advisor to help people understand before just jumping to it) but so far I haven't see many cases where fragmentation for SAN use cases has actually been an issue.

radek_kubka
10,377 Views

From my experience fragmentation manifests itself under specific circumstances, so I agree it may be a non-existing issue for many installations.

What Richard has described though perfectly fits the bill: pure sequential read/write operations tend to perform poorly if a LUN is heavily fragmented. So anything around backup to / from LUN (performed via external host, i.e. not snapshots) is likely to be affected.

One of my customers is doing SQL dumps every day (they do not use SMSQL) & then stream them to tape via NDMP. They run reallocate every Sunday & the performance is always the best on Monday, gradually degrading over the week.

Kind regards,
Radek

__frostbyte_9045
10,377 Views

Our solution to the SQL rights, not that performance was an issue, was to write the SQL data out to a CIFS share.  Vol size set to 700 gigs with 80% reserved.  DBA overwrites his files each day and then we can use snapshots if needed.  This way, we don't have LUN reserve issues and the databases from multiple SQL servers (our SQL cluster plus a few little SQL instances) can be written to a single spot.  We then snapmirror this data to DR.

danpancamo
10,377 Views

We got stung by this issue...   took months to finally discover that

fragmentation was the issue...

I believe a TR on the subject is due.

VMG_sroth
10,377 Views

We are seeing the same issue backing up 600GB LUN's on Windows 2003 servers with millions of small files.  We researched the issue for months and believe it's related to fragmentation.  If we restore the data from tape to a LUN we can back it up in about 8 hours instead of 30.  When I run a reallocate measure on the LUN it comes back with a 3 which means it's not fragmented.  We also run Windows defrag which tells me the same thing that the LUN is not fragmented.  Does running a reallocate with a -p make a difference?  Also those who run a reallocate on a weekly basis do you delete all your snapshots before running it?  I was told I need to delete the snapshots for the reallocate to run.

Thanks

Stephen

chriskranz
10,377 Views

Millions of small files is never going to perform that fantastic. I reckon you may find that a small level of fragmentation is causing a snow-ball effect on performance. I'd certainly give a reallocate a test and see if it makes a difference, especially if a fresh recover from tape backs up very well subsequently.

I've run through a reallocations a couple of times and not needed to delete the snapshots. I think this recommendation is because the snapshot can grow significantly in size after the reallocate, so it may cause you space issues. I've seen pretty decent results when you run 7.3.2.

All I can recommend is to give it a test and see what happens. You could look at running off a FlexClone and splitting it off to give yourself a volume to play with.

radek_kubka
10,377 Views

Hi,

reallocate -p option (physical) gives the benefit of snapshots not growing - hence it is fine to run it & leave snapshots as they are.

Re running reallocate on a FlexClone volume: I doubt this provides any level of separation from an original volume, as FlexClone will point to 'live' blocks anyway & they will get reallocated (hmm, if running reallocate against a FlexClone is possible at all...).

Here is a good story from Bren describing the problem, the solution & one, happy customer :

http://communities.netapp.com/message/24795#24795

Regards,

Radek

rickymartin
8,932 Views

I'll weigh in on this, though by now the question is probably moot, but before I do so, I'd like to say fragmentation isnt neccesarily a bad thing, fragmenting one chunk of data and storing it on multiple spindles can actually improve performance. Fragmentation on a single drive (whcih is what most of us suffer from), is almost universally bad. WAFL has something which is analagous which is a combination of having fewer good areas to write full stripes and increased metadata loads which happens over time until it reaches a steady state. A better term for this is "dataset ageing". Having said that let me answer the question as best as I can.

1. The performance of backup to disk targets on NTFS filesystems degrades over time, in many cases the degradation of performance is significant. The more full the NTFS filesystem is, the faster this happens, from long experience NTFS really needs at least 20% of the available space to remain free in order to retain high performance. This happens regardless of the underlying storage platform. I've seen this happen on DAS, EMC CX, EMC Symmetrix, HP-EVA, and FAS.

2. This performance degradation may be more pronounced on an OnTAP based system, especially if the aggregate which hosts the LUN has less than 20 or so spindles (rule of thumb). In an aggregate with more than 20 spindles (or two shelves in the old DS14 days), the additional spindles, and optimisations available through advanced readahead algorithms and readsets will generally allow OnTAP based systems to exceed the performance of most other storage architectures.

3. For more typical NTFS workloads, the way WAFL reorganises random writes into sequential writes actually offesets the effects of NTFS fragmentation. In general data is requested in the same temporal patterns that it was written in. For example a database record and its associated index are likely to be written to two quite different physical locations/LUNs, however as they are written at the same time, WAFL will write these close together on disk (hence maintianing temporal rather than spacial locality of reference). Typically when the index is read the database record will also be read, rather than issue two seperate read requests for this data, the entire set is predictively gathered in a single sequential read operation. WAFL also effectively cancels out NTFS's nasty habit of fragmenting free space by ignoring the extra whitespace that NTFS adds to the end of every file.

4.I personally am a big fan of "Write After Read" reallocation for many workloads. This has been available as a volume option since 7.3.1 and is a lightweight way of reducing the impact of WAFL ageing, especially for NAS workloads.

5. For some other workloads (Exchange comes to mind), regular nightly reallocation scans are an excellent idea, especially for systems with aggregates containing less than 20 spindles. I've even seen reccomendations similar to the following.

"In the absence of specific problems or detailed knowledge of the application, all volumes containing SAN data (that is, LUNs rather than files), and all volumes containing database tables hosted on NFS, should be set up with a default, once a day, check and conditional reallocate process. The reallocate command should be run on other volumes only when the administrator has specific application knowledge suggesting that there will be a performance benefit."

6. Ever since 7.3.1 (maybe 7.3.0), there has been a -p option available which removes the old limitations of reallocate that used to cause snapshots and snapmirror updates to blow out. THere are some limits to which volumes can do this (IIRC the -p option works with volumes created in OnTAP 7.2 and above). In the past the snapshot/snapmirror blowout probably made netapp a little wary of recommending wholesale use of reallocate, however now that 7.3 has been a GD release for quite some time and has a large install base, my personal feeling is that we should be more proactive in our reccomendations.

7. Always remember that your mileage may vary, and that you should always excercise "Rule 0" - use good judgement. If youre worried about your current performance, and are concerned about the potential implications of running reallocate for your production system, collect a perfstat and raise a call with the Global support center, they should be able to tell you whether reallocate is a good option for you.

Public