Subscribe

De-Dupe volume sizes

My understanding is that there are limits for the size of volumes that can be de-duped. Is that right? What are those limits and where do they come from?

Thanks!

Dave

Re: De-Dupe volume sizes

It all depends on what version of DATA ONTAP you are running and the model of the filer that you own. For instance we run DOT 7.2.6 with a FAS3020 and the dedupe limit is 1TB volume however if we were to upgrade to 7.3.1 (which isn't in General Deployment yet) we could dedupe a 2TB volume.


Re: De-Dupe volume sizes

     untitled.bmp

Re: De-Dupe volume sizes

Information on volume sizes for all supported platforms with various Data ONTAP versions is available in Dedupe DIG: http://media.netapp.com/documents/tr-3505.pdf.

Re: De-Dupe volume sizes

NetApp deduplication is the #1 implementation of deduplication on primary storage, meaning it is being used on production systems to deduplicate active data.  With well over 30,000 licenses installed, it is a proven technology.

A key factor in the success of NetApp deduplication for primary storage is the fact that the system resources of each storage system model are being considered so that they are not oversubscribed, such as system memory.  NetApp deduplication for FAS uses different max volume sizes for different models to help ensure resource availability so that the performance of the primary storage system is maintained.

It is worth noting that this max volume size is a delimiter of the physical size of the volume only.  That is to say that even though a volume size may be limited to 3 TB in size, it is still capable of storing greater than 3TB of deduplicated data.  For example, you might see 5 TB of data being stored, but it would only be using 2TB of storage thanks to deduplication.

Below are the max vol sizes by model and version of Data ONTAP.

Data ONTAP 7.2.X (Starting with 7.2.5.1) and Data ONTAP 7.3.0

FAS2020

FAS3020 N5200 FAS2050

FAS3050 N5500

FAS3040 FAS3140 N5300

R200

FAS3070 N5600 FAS3160

FAS6030 FAS6040 N7600 FAS3170

FAS6070 FAS6080 N7800

0.5TB

1TB

2TB

3TB

4TB

6TB

10TB

16TB

Data ONTAP 7.3.1 or higher

FAS2020

FAS3020 N5200 FAS2050

FAS3050 N5500

FAS3040 FAS3140 N5300

R200

FAS3070 N5600 FAS3160

FAS6030 FAS6040 N7600 FAS3170

FAS6070 FAS6080 N7800

1TB

2TB

3TB

4TB

4TB

16TB

16TB

16TB

Re: De-Dupe volume sizes

Thanks, Carlos!

A couple of follow up questions:

Where do these limits come from and why do they vary from system to system? For example, why is the limit 4TB on the FAS 3140 and 16TB on the FAS 3170?

Is there any work-around for this? For example, if I have a bunch of VMware boot images that would de-dupe down to 5TB of real disk space is there any way to do that (on a FAS 3140)? If the data that's common between images is about 1TB each, it would be a shame to have to duplicate that data a bunch of times due to this limit.

Re: De-Dupe volume sizes

Hi Carlos,

Two things:

NetApp deduplication for FAS uses different max volume sizes for different models to help ensure resource availability so that the performance of the primary storage system is maintained.

Does it relate to system resources during actual de-duplication or outside of this process? The former may or may not be a problem, as in a non-24/7 environment hammering the system for say 8 hours only to de-dupe the data could be 100% feasible. The latter is a subject to discussion in two separate threads here & I am yet to hear a firm answer to this.

For example, you might see 5 TB of data being stored, but it would only be using 2TB of storage thanks to deduplication.

Well, nice. The problem is A-SIS is a post-process deduplication, so if above scenario happens on one of the smaller filers, it may mean repetitive adding the un-de-duped data to the volume, running A-SIS against it, adding more data & so on, so forth. A bit tedious & not every admin will have enough time / patience to actually do this.

Do not get me wrong - I love A-SIS, but what I am saying is that capping volume sizes can make people's life harder, so the question is whether there is a good reason behind that.

Regards,
Radek

Re: De-Dupe volume sizes

> Where do these limits come from and why do they vary from system to system? For example, why is the limit 4TB on the FAS 3140 and 16TB on the FAS 3170?

The limits come from the available resources on the systems (CPU, memory, etc...).

> Is there any work-around for this?

The limits are volume based so one can  break things into multiple volumes.  That obviously has some trade-offs in terms of deduplication as well as other areas but is certainly a possibility.

Re: De-Dupe volume sizes

The volume size limits help ensure that the actual process of deduplication does not oversubscribe the system resources.

Working with a post-processing deduplication process means that you must take the initial size of the un-deduped data into consideration.

You will need to consider best practices for each specific scenario, as described in TR-3505, mentioned in a previous reply.

Re: De-Dupe volume sizes

Perfect graph....I've found this information in multiple places but that's the nicest representation so far.

And with 7.3.1+, it's ever so much less painful given being able to shrink a volume back down under the limit to turn on dedup.