ONTAP Discussions

Where does meta data for dedupe get stored in aggregates

matthewt
5,376 Views

We are working to manage a large SnapVault secondary environment that is deduplicated and thinly provisioned.

We are trying to manage the space in our aggregates through the use of a "blocker volume" or fully guaranteed volume that will ensure that the aggregates maintain free space blocks for performance reasons, and to act as an emergency reserve.

The question is, where exactly does all the metadata for the deduplication get stored?

We could alter our layout to include larger aggregate snapshot reserve space, or no reserved volume space if needed, if it would help the appliance store and process this metadata more efficiently.

10 REPLIES 10

amiller_1
5,341 Views

As of ONTap 7.3.x metadata for dedup is actually stored in the aggregate outside of the volume (that's why you can now shrink a volume under the dedup limit and enable dedup -- wasn't possible to do that under 7.2.x).

Your idea makes sense -- but what you more likely want to use and track is the aggregate snap reserve. It's set to 5% by default and generally 3% is quite safe. If you monitor the used space in the aggregate snap reserve (and how it changes as you enable dedup), that should give you some feel for the size of the dedup metadata.

matthewt
5,341 Views

From a purely space management point of view, you would choose the Aggregate Snap Reserve as our tool for keeping aggregate free space then?  If we increased the aggregate snap reserve to 10%, say that would provide the same effect as maintaining a fully reserved volume, but would allow the dedupe metadata to use the space?

What we see is some performance degredation, even with our reserved volume.  If we could retain the performance by using an increased Aggregate Snap Reserve, we would go that way.

madden
5,342 Views

Check out TR3505 for a detailed discussion of how deduplication is implemented; I think after reading it your architectural questions will be [mostly] answered.

For your metadata sizing question on pg 17 of the TR it says:

The guideline for the amount of extra space that should be left in the aggregate or volume for the deduplication metadata overhead is as follows:

o) If you’re running Data ONTAP 7.2.X, leave about 6% extra space inside the volume on which you plan to run deduplication.

o)If you’re running Data ONTAP 7.3, leave about 2% extra space inside the volume on which you plan to run deduplication, and around 4% extra space outside the volume in the aggregate, for each volume running deduplication.

Read more of the TR to understand these recommendations.

WAFL has access to any unallocated block in the aggr and will select the specific ones to use in an optimized manner.  So for your question about maintaining freespace via aggr reserve or a flexvol with guarantee=vol it really doesn't matter.  In both situation's you're reserving the right to use blocks (rather than any specific blocks).  You should however be watchful to not run completely out of space in either your vol or aggr active filesystem.  Remember, if you reserve blocks (via aggr reserve or your volume=guarantee volume) these are not available for other volumes snapvault updates or any ASIS work.  I'd monitor and manage your aggr freespace to begin with rather than reserving gobs extra "just in case".

eric_barlier
5,341 Views

Matthew,

It sounds like your environment is similar to where we are heading. We re on 7.2 now, thin prov. and deduped. In near future we

are upgrading to 7.3.2 and will use snapvault.

From Maddens explanation it seems we can deduct there is a 6% metadata cost of running dedupe. From 7.3 the aggr. takes over 4%

of this. Thats the change there.

You then said you have somewhat degraded performance, this is interesting for me. How do you perceive this and why is it linked to

dedupe?

Im trying to be pro-active here and get a better understanding of what can happen going forward.

Thanks,

Eric

matthewt
5,341 Views

I want to preface by saying that my "degraded performance" reference is very specific.  We have 4 6080 SnapVault targets running 7.3.1.1 that are fully thin provisioned and run deduplication.  Our SnapVault functions perform just fine.

I have been amassing experience with the deduplication processes and trying to ensure that we are using the best practices possible.  There is a fair amount of variability in deduplication performance on our systems and I am trying to understand why.  We have some challenges with capacity, and have found that keeping adequate aggregate free space is critical (and thats not just for deduplication).

I am still looking for other ways of tracking and improving the performance of deduplication specifically.  In our experience, its much harder to "catch up" if your deduplication processing falls behind, that it is to stay current.  If we can avoid falling behind by addressing issues proactively, it would simplify our management tasks a great deal.

If you would like to compare notes with me about how our implentation is structured, and the experiences we have had with running deduplication on our secondaries, please feel free to contact me off line.  I have been collecting our experiences so that we can share them internally.  I have some material and am working on a kb, etc.

amiller_1
5,341 Views

Am afraid I fell off this thread for a bit (not that it mattered as you got better answers than I could supply ).

I think I'd personally use aggregate snap reserve -- you need it anyway and keeps as one less thing to manage.

Ultimately, both methods accomplish the same thing....more just how you like to track/manage it I think.

dwarburton
5,341 Views

I have a similar question regarding our 2050's which we are using as Snapvault Secondary targets.

We've had a Netapp consultant recommend that we set Aggregate Snap Reserve to 0% for pretty much every single aggregate. He said that it was only used for synchronous snap mirror (metro clustering) which we don't and will almost certainly never use.

In fact I was also, based on this advice, going to set the aggr snap reserve to 0% for our Exchange aggregate (on a FAS3050) as we are migrating to Exchange 2007 and space is critically low at the moment.

From reading this thread I'm starting to get the feeling that leaving the aggr snap reserve is generally a good thing and indeed necessary for de-dupe. Is that right?

radek_kubka
5,341 Views

Hi David & welcome to the forums!

From reading this thread I'm starting to get the feeling that leaving the aggr snap reserve is generally a good thing and indeed necessary for de-dupe. Is that right?

I think it isn't actually. Basically if you have the aggregate snap reserve set to anything bigger than 0%, then some space is reserved, i.e. not available, i.e. not free. What other people are saying in this thread is the fact that you need certain amount of *free* space within aggregate to run de-dupe.

You can run df -A command both with & without snap reserve to see the difference - basically only the amount of free space in the first row makes any difference from the perspective of this conversation.

[edit: OK, I read more carefully all posts within the thread & some people are suggesting aggregate snap reserve space can be used for storing de-dupe metadata - I personally think this is not the case, as aggr snap reserve can be only used for what it was meant, i.e. storing aggr snapshots]

Regards,
Radek

eric_barlier
5,341 Views

Radek is right I belive. Snap reserve is exactly that, a read only reserve for snaps. So Im not sure how metadata could be written in that space.

Personally I feel that the best option is to turn aggr. snap reserve OFF unless you do metroclustering of course where its good to have.

It will facilitate space management and free up some much needed space of course.

Eric

matthewt
4,149 Views

I have also further clarified this question with Engineers.

Aggregate Snap Reserve is not used for deduplication metadata storage.  The storage of deduplication metadata is split roughly between space in the specific volume being deduplicated and Aggregate Free Space.  That space must be free, and so cannot be simply "free block" that might be reserved as part of unused flex-vol space, or aggregate snap reserve space.

I have put this into practice in our environment and seen improved results.

Whether or not to set your Aggregate Snap Reserve space to a value greater than 0% is a different question.  That is a question best answered by people planning for DR.  The primary purpose of Aggregate Snapshots is for the recovery of Flex-vol configurations during certain extreme WAFL issues, or (as mentioned above) to manage the snapshots associated with sync-mirror.

Using the Aggregate Snap Reserve as a way of guaranteeing deduplication metadata storage capacity is not valid.

Public