There are 3 key factors that effect the performance impact of dedupe-
1) The NetApp FAS or V-Series model
2) The amount of duplicate data in the volume
3) Other processes the system is servicing during the dedupe process
If we look at a typical scenario (impossible, I know, but bear with me) - lets say we have a FAS3070, a 1TB volume with 5% duplicate data, and the system is fairly quiet. This would be a typical setting for running dedupe overnight on a regular basis. I would expect this system to complete dedupe in less than an hour and have no impact on workloads (since there aren't any running).
On the other hand, if we have a FAS2050, 90% duplicate data, and the system is running at peak load - the dedupe process will take many hours and you will likely see some performance degradation resulting from dedupe.
The problem is that there are too many variables for us to give an exact number. Instead, we recommend two things:
1) If your application or system is extremely performance-sensitive, don't run dedupe
2) If you are concerned that dedupe will create an excessive performance penalty, run a POC first
Also, remember that your can easily turn off dedupe, and/or "undo" dedupe if you don't like the results you get.
In general, the answer is yes - a volume that has been deduped should not show any appreciable read performance degradation. Since WAFL is a random-layout filesystem, deduplication merely re-randomizes the data blocks. Also remember that NetApp dedupe does not use containers or lookup tables to rehydrate data, we just redirect the existing block pointer metadata. Having said that - I have seen a few cases where read performance degraded, but this is unusual and not predictable - it all depends on the block layout pattern and the pattern of read requests. And as I mentioned earlier - you can always undo dedupe if you don't like the results.
Another point worth mentioning is using dedupe together with the Performance Acceleration Module (PAM.) PAM is dedupe-aware so you can actually improve read performance after dedupe with this combination. We've done some tests and I think published them that show dramatic improvement in VDI "boot storm" response times as a result of dedupe and PAM.
Hey Radek - I think your numbers are actually backwards. You will see a small increase in CPU on writes but you shouldn't see an increase in reads in most instances. The reason for the increases on writes is because when a block is written it is checked ("finger printed") to see if an identical block has been written already and it will be eligible for de-dupe on the next pass.
Check question 15 on this de-dupe faq, very good read:
As for your theory vs. reality question: I have numerous customers running de-dupe in many different forms (NFS shares for VMWare, LUNS for VMWare, CIFS, etc.). On the whole, they couldn't be happier with it. You want to watch yourself because of the additional CPU overhead on a filer that is already being hit hard for some reason because the additional CPU might put it over the edge. But, on the flip side it is also very easy to turn off the finger print analysis if you suspect this to be contributing to a greater problem.
Let me actually question what Antoni wrote in his document, as my understanding of A-SIS is that there should be no write performance penalty. The reason for this is that A-SIS is a post-process de-duplication, so we are writing blocks which will be processed at a scheduled time & are not processed while they are being written to a to a volume.
Read penalty is definitely a hairier topic & I would really appreciate if Larry comes back to us at shed some additional light on it.
Lets break down whats happening during the pre- and post- deduplication stages, this should help explain performance impact.
Remember that NetApp deduplication on FAS and V-Series systems involves 2 steps - 1) enable dedupe on a volume (sis on) then at some point 2) dedupe the data in that volume (sis start)
When you 'sis on' a volume, the behavior of that volume changes. Every time it notices a block write request coming in, the sis process makes a call to Data ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalog file. This request interrupts the write string and results in a 7% performance penalty for all writes into any volume with sis enabled. We know its 7% because we measured it in our labs and lab machines don't lie - however every customer I've spoken to says they can't tell the difference, I guess we humans aren't quite so precise.
Now, at some point you'll want to dedupe the volume using the 'sis start' command. As sis goes through the process of comparing fingerprints, validating data, and dedupe'ing blocks that pass the validation phase - in the end all we are really doing is adjusting some inode metadata to say "hey remember that data that used to be here, well its over there now." Nothing about the basic data structure of the WAFL file system has changed, except you are traversing a different path in the file structure to get to your desired data block. Like going the the grocery store, you can take Elm Street or Oak Street and depending on traffic either way might get you there faster.
Thats why NetApp dedupe *usually* has no perceivable impact on read performance - all we've done is redirect some block pointers. Accessing your data might go a little faster, a little slower, or more likely not change at all - it all depends on the pattern of the file system data structure and the pattern of requests coming from the application.
Although you proved me wrong 😉 I really appreciate you refreshed my memory & associated this magic 7% correctly.
I wasn't aware (or I didn't remember) that fingerprints are collected upfront, whilst writes are coming in. Does it mean that (at leat in theory) these new blocks will be processed faster when doing actual de-dupe vs. first run on a 'fresh' volume with data without any de-dupe history?
Precisely -- that's why you have to do a "sis start -s" after you enable dedup on a volume with existing data (so all those fingerprints can get generated in the first pass -- the dedup then happens in a second pass and all later passes).
For my experience -- basically the same as Aaron's -- multiple (very happy) customers using it with no perceived performance impact.
Very good stuff overall....I especially LOVE it in VMware environments (after a robust NFS implementation, dedup is probably what I miss the most when working with VMWare on other arrays).
Literally yesterday I had a chat with a customer who have tried de-dupe on NFS datastore hosting their VMware VM golden images / templates. What they have noticed though was massive performance hit when cloning a template (via VirtualCenter, no FlexClone involved). So they pulled back & did not try to de-dupe their production VMs at all.
I do appreciate the story may be (& typically is) different for production VMs, but what could be the explanation of their problems?
Looking at the discussion above, fingerprints are collected for all new writes when A-SIS is turned on - so is it as simple as that, that VirtulCentre in essence writes a lot of new blocks at once when 'cloning' a template, thus performance hit is obvious? (vs. production VMs which are mostly read, not written to?)
Or has it something to do they are on a bit old-ish ONTAP version (7.2.x)? Or a combination of both?
They are probably hitting a bug in 7.2. We have a customer that had the exact same problem. A 20GB deployment would take 14hours!! They went to 7.2.6 (22.214.171.124 maybe?) and the bug was fixed for them. If they are doing NFS and dedupe, I would highly recommend they go to 126.96.36.199. Again, from the same customer and talking to NetApp tech support NFS was changed in the 7.3 code and is much more efficient in a virtualized environment. The reason for 188.8.131.52 (P2 specfically) is because many de-dupe optimizations were made as well.
Disclaimer here - My customer went to 184.108.40.206 and deployments were fixed, I don't have experience with 220.127.116.11 or higher, that is just what NetApp Support told our customer. They have not upgraded, but they plan too.
Also, make sure the templates are aligned. This could be contributing to the problem as well but it is by no means the primary problem.
We hit this bug as well....heavily deduped and running 7.2.4.
It was killing VM deployments as well as VCB backups....hours and hours for deployments from template and in a couple cases VCB backups taking 8-9 hrs for ~10GB vmdks.
Upgraded to 7.3.1P3 couple months and it has been great since. Depending on various factors, we now see template deployments of 20GB images takes 30-40 mins and backups in the 10-20 minute range.
The particulars of the bug were related to prolonged sequential reads (not writes) of deduplicated data. This is why it would impact operations such as deployments and backups with heavy reads of large chunks of deduplicated data as opposed to operations involving running VMs where the reads are not prolonged and sequential.
PS--As Aaron notes, alignment is a good idea but we didn't find it to be the primary factor either...we chased that dragon's tail while waiting for confirmation on whether or not the bug was affecting us. While aligned vmdks faired somewhat better, operations were still taking hours and hours until we upgrade to 7.3.1P3with the big fix and other enhancements.
Complete agreement again with Aaron (seems to be a trend here).
I would not run dedup on anything else than 18.104.22.168 under any circumstances (due to some nasty bugs) and would recommend at least 22.214.171.124P4 or 126.96.36.199 if possible (as there are some significant dedup speed optimizations...see the release notes).
I do have customers with dedup enabled (on later ONTap versions) who definitely aren't seeing this issue.