ONTAP Discussions

De-duping existing 6.5TB NAS volume - Should I do entire scan or just new writes ?

ASHWINPAWARTESL
3,072 Views

Hi,

I would appreciate any suggestions on this.

Scenario:

FAS3270-8.1.2

1. Planing to dedupe existing 6.5 TB (nearly full) NAS Volume [1.5TB trapped in snapshot], which is never been enabled for de-dupe.

2. This volume is snapmirrored to DR Filer on hourly basis.

I am estimating atlesat 30% savings on this dataset [6.5-1.5=5TB x 30 % = 1.5TB]

So I am hoping to save around 1.5TB, considering this change in data volume size :

1.What would be the snapmirror impact when I actually kick-in process post dedupe. [Assuming snapmirror will be on hold until the entire dedupe finishes]

2. This is a very key volume (NAS share) for the organization, is it worth doing a entire scan [de-dupe], I am concerned about the [read] performance impact that it may have post de-dupe. Is it worth it ? or should I just de-dupe all the new writes that come in?

I have read almost all threads on performance related issues, and it is said that there is no read performance impact, just new writes have 7% impact.

It would be great to some use cases - Wherein a large volume (NAS) is de-duped from scratch ? and has it impacted the read performance post-dedupe ?

Thanks,

-Ashwin

1 ACCEPTED SOLUTION

cscott
3,072 Views

Hi Ashwin

1.What would be the snapmirror impact when I actually kick-in process post dedupe. [Assuming snapmirror will be on hold until the entire dedupe finishes]

      Snapmirror will see this as change and will have an update equal to your dedupe savings.  You will also see your snapshot usage grow by the same amount as you dedupe until they roll off.

2. This is a very key volume (NAS share) for the organization, is it worth doing a entire scan [de-dupe], I am concerned about the [read] performance impact that it may have post de-dupe. Is it worth it ? or should I just de-dupe all the new writes that come in?

      Read performance won't be impacted as reads are read into cache, so any deduped data will be read into cache and served out that way already.  And if that block isn't in cache, it won't take any longer to load a deduped block than an inflated block.  We have 14T volumes deduped to just over 1TB with no noticeable difference

I have read almost all threads on performance related issues, and it is said that there is no read performance impact, just new writes have 7% impact.

      I cannot speak to this, as we dedupe on a schedule, so there is no write impact any customer has been able to identify.  They also do not see this while the schedule is running.

It would be great to some use cases - Wherein a large volume (NAS) is de-duped from scratch ? and has it impacted the read performance post-dedupe ?

     As I stated above we have turned dedupe on for multiple 14TB volumes with a nearly 14:1 dedupe ratio in almost all cases on NFS data(all volumes contained extremely similar data) without the customer ever seeing any impact. I also have deduped 4TB of running ESX datastore data with no impact and a 12TB volume that didn't dupe very well at all, only about 5% and again the customer never knew it had happened.  My most recent was nearly 9TB, deduped down to 5TB and it was great that the customer actually asked me when I was going to run it...three days after the process had finished and been running on schedule and the change closed out.

- Scott

View solution in original post

2 REPLIES 2

cscott
3,073 Views

Hi Ashwin

1.What would be the snapmirror impact when I actually kick-in process post dedupe. [Assuming snapmirror will be on hold until the entire dedupe finishes]

      Snapmirror will see this as change and will have an update equal to your dedupe savings.  You will also see your snapshot usage grow by the same amount as you dedupe until they roll off.

2. This is a very key volume (NAS share) for the organization, is it worth doing a entire scan [de-dupe], I am concerned about the [read] performance impact that it may have post de-dupe. Is it worth it ? or should I just de-dupe all the new writes that come in?

      Read performance won't be impacted as reads are read into cache, so any deduped data will be read into cache and served out that way already.  And if that block isn't in cache, it won't take any longer to load a deduped block than an inflated block.  We have 14T volumes deduped to just over 1TB with no noticeable difference

I have read almost all threads on performance related issues, and it is said that there is no read performance impact, just new writes have 7% impact.

      I cannot speak to this, as we dedupe on a schedule, so there is no write impact any customer has been able to identify.  They also do not see this while the schedule is running.

It would be great to some use cases - Wherein a large volume (NAS) is de-duped from scratch ? and has it impacted the read performance post-dedupe ?

     As I stated above we have turned dedupe on for multiple 14TB volumes with a nearly 14:1 dedupe ratio in almost all cases on NFS data(all volumes contained extremely similar data) without the customer ever seeing any impact. I also have deduped 4TB of running ESX datastore data with no impact and a 12TB volume that didn't dupe very well at all, only about 5% and again the customer never knew it had happened.  My most recent was nearly 9TB, deduped down to 5TB and it was great that the customer actually asked me when I was going to run it...three days after the process had finished and been running on schedule and the change closed out.

- Scott

ASHWINPAWARTESL
3,072 Views

This is exactly what I needed to know. Thank you Scott. I really appreciate it. [Glad to see that ratio 14:1]

Public