We recently upgarded from 9.2P4 to 9.3P10 on our AFF 4 node 8080 cluster. One of the features I wanted to implement was volume-level background deduplication. I removed scheduling from every volume and set everything to auto policy. I thought it was great that I wouldn't have to manage scheduling of deduplication jobs anymore.
After several weeks, I'm noticing a sharp uptick in the number of volumes alerting that they are running out of space. In each case, my first instinct is to run a quick manual deduplication job just to make sure I really need to resize the volume. In every case so far, the alerting volumes were "deprioritized" by the auto policy so I couldn't even run dedupe manually without promoting the volume.
As I reviewed this situation, I noticed how the "auto" policy actually works. I thought it effectively eliminated the need for scheduled deduplication - i.e. that essentially each volume would just do the inline dedupe/compression and get the same benefits it would have had previously had I done that + scheduled dedupe. What I discovered was regular deduplication jobs running at very random times (in addition to the inline dedupe). Those random jobs might run hours after the nightly backup, so it misses some of the savings it would have had if it were run before snapshots were generated.
The last straw was this morning when one of my VMware datastore volumes alerted that it was low on space. Even it was deprioritized, and these volumes have the higest rate of dedupe/compression savings on our cluster. Although I don't want to, I'm starting to think I need to revert this feature and go back to scheduling.
Does anyone have insight into this issue? In particular, are there any improvements to this feature in 9.4 or 9.5? Any suggestions or feedback is appreciated!
Postprocess compression and deduplication share the same scheduler and can be scheduled to run in one of five different ways: • Inline • Scheduled on specific days and at specific times • Manually, by using the command line • Automatically, when 20% new data has been written to the volume • SnapVault software based, when used on a SnapVault destination
Thanks @christsai . The phrase in the TR "Automatic background deduplication performs continuousbackground deduplication on all enabled volumes, without manual configuration" (emphasis mine) is what threw me off. Apparently "continuous" means "if there's 20% of change". I appreciate the insight and I will definitely be changing back to scheduled!