Re: Deduplication Performance Impact?

amiller_1 · ‎2009-06-18

What kind of performance impact does deduplication have on performance? (both during the scheduled deduplication process and also "business hours")

lfreeman · ‎2009-06-18

Andrew,

There are 3 key factors that effect the performance impact of dedupe-

1) The NetApp FAS or V-Series model

2) The amount of duplicate data in the volume

3) Other processes the system is servicing during the dedupe process

If we look at a typical scenario (impossible, I know, but bear with me) - lets say we have a FAS3070, a 1TB volume with 5% duplicate data, and the system is fairly quiet. This would be a typical setting for running dedupe overnight on a regular basis. I would expect this system to complete dedupe in less than an hour and have no impact on workloads (since there aren't any running).

On the other hand, if we have a FAS2050, 90% duplicate data, and the system is running at peak load - the dedupe process will take many hours and you will likely see some performance degradation resulting from dedupe.

The problem is that there are too many variables for us to give an exact number. Instead, we recommend two things:

1) If your application or system is extremely performance-sensitive, don't run dedupe

2) If you are concerned that dedupe will create an excessive performance penalty, run a POC first

Also, remember that your can easily turn off dedupe, and/or "undo" dedupe if you don't like the results you get.

Hope that helps,

Larry

amiller_1 · ‎2009-06-18

Thanks -- very helpful.

Would it be safe to say that performance on a deduplicated volume should be a non-issue? (i.e. production usage of a volume with deduplicated data)

(I've got some experience here but am betting you'll be able to provide a more comprehensive answer. ).

lfreeman · ‎2009-06-18

Andrew,

In general, the answer is yes - a volume that has been deduped should not show any appreciable read performance degradation. Since WAFL is a random-layout filesystem, deduplication merely re-randomizes the data blocks. Also remember that NetApp dedupe does not use containers or lookup tables to rehydrate data, we just redirect the existing block pointer metadata. Having said that - I have seen a few cases where read performance degraded, but this is unusual and not predictable - it all depends on the block layout pattern and the pattern of read requests. And as I mentioned earlier - you can always undo dedupe if you don't like the results.

Another point worth mentioning is using dedupe together with the Performance Acceleration Module (PAM.) PAM is dedupe-aware so you can actually improve read performance after dedupe with this combination. We've done some tests and I think published them that show dramatic improvement in VDI "boot storm" response times as a result of dedupe and PAM.

What has your experience been?

Larry

radek_kubka · ‎2009-06-18

Hi Larry,

I got these number stuck in my mind - 0% performance degradation for writes & 7% for reads (de-duped volume vs. the original one).

Where do they came from? I've heard this from one of NetApp US folks during their visit in UK about 2 (?) years ago (might that be you by any chance? 😉

So the question is: are these numbers (the one for reads in particular) anywhere close to today's A-SIS reality?

Regards,

Radek

aarondelp · ‎2009-06-23

Hey Radek - I think your numbers are actually backwards. You will see a small increase in CPU on writes but you shouldn't see an increase in reads in most instances. The reason for the increases on writes is because when a block is written it is checked ("finger printed") to see if an identical block has been written already and it will be eligible for de-dupe on the next pass.

Check question 15 on this de-dupe faq, very good read:

http://communities.netapp.com/docs/DOC-1701

As for your theory vs. reality question: I have numerous customers running de-dupe in many different forms (NFS shares for VMWare, LUNS for VMWare, CIFS, etc.). On the whole, they couldn't be happier with it. You want to watch yourself because of the additional CPU overhead on a filer that is already being hit hard for some reason because the additional CPU might put it over the edge. But, on the flip side it is also very easy to turn off the finger print analysis if you suspect this to be contributing to a greater problem.

Aaron

radek_kubka · ‎2009-06-24

Hi Aaron,

So we have a proper discussion (at last)!

Let me actually question what Antoni wrote in his document, as my understanding of A-SIS is that there should be no write performance penalty. The reason for this is that A-SIS is a post-process de-duplication, so we are writing blocks which will be processed at a scheduled time & are not processed while they are being written to a to a volume.

Read penalty is definitely a hairier topic & I would really appreciate if Larry comes back to us at shed some additional light on it.

Regards,

Radek

lfreeman · ‎2009-06-24

Hi Radek-

Lets break down whats happening during the pre- and post- deduplication stages, this should help explain performance impact.

Remember that NetApp deduplication on FAS and V-Series systems involves 2 steps - 1) enable dedupe on a volume (sis on) then at some point 2) dedupe the data in that volume (sis start)

When you 'sis on' a volume, the behavior of that volume changes. Every time it notices a block write request coming in, the sis process makes a call to Data ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalog file. This request interrupts the write string and results in a 7% performance penalty for all writes into any volume with sis enabled. We know its 7% because we measured it in our labs and lab machines don't lie - however every customer I've spoken to says they can't tell the difference, I guess we humans aren't quite so precise.

Now, at some point you'll want to dedupe the volume using the 'sis start' command. As sis goes through the process of comparing fingerprints, validating data, and dedupe'ing blocks that pass the validation phase - in the end all we are really doing is adjusting some inode metadata to say "hey remember that data that used to be here, well its over there now." Nothing about the basic data structure of the WAFL file system has changed, except you are traversing a different path in the file structure to get to your desired data block. Like going the the grocery store, you can take Elm Street or Oak Street and depending on traffic either way might get you there faster.

Thats why NetApp dedupe *usually* has no perceivable impact on read performance - all we've done is redirect some block pointers. Accessing your data might go a little faster, a little slower, or more likely not change at all - it all depends on the pattern of the file system data structure and the pattern of requests coming from the application.

Larry

radek_kubka · ‎2009-06-24

Hi Larry,

Thanks a million for your reply!

Although you proved me wrong 😉 I really appreciate you refreshed my memory & associated this magic 7% correctly.

I wasn't aware (or I didn't remember) that fingerprints are collected upfront, whilst writes are coming in. Does it mean that (at leat in theory) these new blocks will be processed faster when doing actual de-dupe vs. first run on a 'fresh' volume with data without any de-dupe history?

Regards,
Radek

amiller_1 · ‎2009-06-24

Precisely -- that's why you have to do a "sis start -s" after you enable dedup on a volume with existing data (so all those fingerprints can get generated in the first pass -- the dedup then happens in a second pass and all later passes).

For my experience -- basically the same as Aaron's -- multiple (very happy) customers using it with no perceived performance impact.

Very good stuff overall....I especially LOVE it in VMware environments (after a robust NFS implementation, dedup is probably what I miss the most when working with VMWare on other arrays).

radek_kubka · ‎2009-06-25

OK, so here is a follow-up story / question:

Literally yesterday I had a chat with a customer who have tried de-dupe on NFS datastore hosting their VMware VM golden images / templates. What they have noticed though was massive performance hit when cloning a template (via VirtualCenter, no FlexClone involved). So they pulled back & did not try to de-dupe their production VMs at all.

I do appreciate the story may be (& typically is) different for production VMs, but what could be the explanation of their problems?

Looking at the discussion above, fingerprints are collected for all new writes when A-SIS is turned on - so is it as simple as that, that VirtulCentre in essence writes a lot of new blocks at once when 'cloning' a template, thus performance hit is obvious? (vs. production VMs which are mostly read, not written to?)

Or has it something to do they are on a bit old-ish ONTAP version (7.2.x)? Or a combination of both?

Regards,
Radek

aarondelp · ‎2009-06-25

They are probably hitting a bug in 7.2. We have a customer that had the exact same problem. A 20GB deployment would take 14hours!! They went to 7.2.6 (7.2.6.1 maybe?) and the bug was fixed for them. If they are doing NFS and dedupe, I would highly recommend they go to 7.3.1.1. Again, from the same customer and talking to NetApp tech support NFS was changed in the 7.3 code and is much more efficient in a virtualized environment. The reason for 7.3.1.1 (P2 specfically) is because many de-dupe optimizations were made as well.

Disclaimer here - My customer went to 7.2.6.1 and deployments were fixed, I don't have experience with 7.3.1.1 or higher, that is just what NetApp Support told our customer. They have not upgraded, but they plan too.

Also, make sure the templates are aligned. This could be contributing to the problem as well but it is by no means the primary problem.

Regards,

Aaron

radek_kubka · ‎2009-06-25

Hi Aaron,

Many thanks for this - as soon as I heard "we are on early ONTAP version" the red light went on, but it's handy to know somewhere else it also caused similar issues!

Regards,
Radek

amiller_1 · ‎2009-06-26

Complete agreement again with Aaron (seems to be a trend here).

I would not run dedup on anything else than 7.2.5.1 under any circumstances (due to some nasty bugs) and would recommend at least 7.2.6.1P4 or 7.3.1.1 if possible (as there are some significant dedup speed optimizations...see the release notes).

I do have customers with dedup enabled (on later ONTap versions) who definitely aren't seeing this issue.

danielmorgenstern · ‎2009-06-25

We hit this bug as well....heavily deduped and running 7.2.4.

It was killing VM deployments as well as VCB backups....hours and hours for deployments from template and in a couple cases VCB backups taking 8-9 hrs for ~10GB vmdks.

Upgraded to 7.3.1P3 couple months and it has been great since. Depending on various factors, we now see template deployments of 20GB images takes 30-40 mins and backups in the 10-20 minute range.

The particulars of the bug were related to prolonged sequential reads (not writes) of deduplicated data. This is why it would impact operations such as deployments and backups with heavy reads of large chunks of deduplicated data as opposed to operations involving running VMs where the reads are not prolonged and sequential.

Dan

PS--As Aaron notes, alignment is a good idea but we didn't find it to be the primary factor either...we chased that dragon's tail while waiting for confirmation on whether or not the bug was affecting us. While aligned vmdks faired somewhat better, operations were still taking hours and hours until we upgrade to 7.3.1P3with the big fix and other enhancements.

Sharon · ‎2016-05-04

Thank you for such a clear cut answer. Able to understand indepth on dedupe after seeing ur reply.

@lfreeman wrote:
Hi Radek-

Lets break down whats happening during the pre- and post- deduplication stages, this should help explain performance impact.

Remember that NetApp deduplication on FAS and V-Series systems involves 2 steps - 1) enable dedupe on a volume (sis on) then at some point 2) dedupe the data in that volume (sis start)

When you 'sis on' a volume, the behavior of that volume changes. Every time it notices a block write request coming in, the sis process makes a call to Data ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalog file. This request interrupts the write string and results in a 7% performance penalty for all writes into any volume with sis enabled. We know its 7% because we measured it in our labs and lab machines don't lie - however every customer I've spoken to says they can't tell the difference, I guess we humans aren't quite so precise.

Now, at some point you'll want to dedupe the volume using the 'sis start' command. As sis goes through the process of comparing fingerprints, validating data, and dedupe'ing blocks that pass the validation phase - in the end all we are really doing is adjusting some inode metadata to say "hey remember that data that used to be here, well its over there now." Nothing about the basic data structure of the WAFL file system has changed, except you are traversing a different path in the file structure to get to your desired data block. Like going the the grocery store, you can take Elm Street or Oak Street and depending on traffic either way might get you there faster.

Thats why NetApp dedupe *usually* has no perceivable impact on read performance - all we've done is redirect some block pointers. Accessing your data might go a little faster, a little slower, or more likely not change at all - it all depends on the pattern of the file system data structure and the pattern of requests coming from the application.

Larry

gfz-marco · ‎2018-11-16

Any recommendations on deduplicating ESX volumes on a 4node Metrocluster?

Post Process, Inline or both?

Also would cross volume dedup be a good setting?

Even though this is a rather old topic I hope I'll find answers...

erick_moore · ‎2009-10-06

Well, I think performance is running fine for us, but we are seeing a very high queue depth when doing a "lun stats -o" on a de-duped volume with a single LUN in the volume. We aren't seeing it on every de-duped LUN, but on this one it is constantly over 20. I have been having a hard time tracking down the reason for the high queue depth. Anyone have any ideas?

lrhvidsten · ‎2009-06-26

Thought this was an interesting blog post about the possible performance benefits by using dedupe. Having a PAM card, as mentioned before, would only magnify this effect I would guess...

http://blogs.netapp.com/dropzone/2009/06/dedup-for-speed-higher-performance-through-deduplication.html

amiller_1 · ‎2009-06-26

Yes -- I've had this conversation a couple times with customers and it doesn't take too long before you see the light come on in their eyes....if it's same data, you're more likely to have it stored within the cache when you need (with or without a PAM card).

Quite nice....

Deduplication Performance Impact?

Join us in Vegas, September 23-25