Inline Compaction and Existing Data Timing

Rowl · ‎2017-04-24

I have been trying compaction with some existing volumes that seem like good candiates. Multi TB in size, million of small files, that sort of thing.. Now I know this will take time to process, but has anyone had some experience with this to give some estimates, even wild ones are better than nothing

One of these volumes I have has an average file size of around 700 bytes, about 4TB of data space used. So taking a conservative estimate of packing 2 files into 1 block, my aggregate level savings should be at least 50% of the data set size. Ideally we should pack 5 of these in a block, but just working on an estimate to keep math simple. At the rate space savings is showing up on my aggregate, it looks like we are getting up to 500 MB / hour back. This looks like it will take many months to finish. Curious if this is a multi stage process that may speed up at some point, or if this is simlar to what others have seen.

I did some testing with small files on a non-prod environment, and when creating new files this worked as one would expect. So for new volumes this looks like a big win, but not sure if this is even worth running for existing volumes if it is going to take months to process a volume. I have many multi TB volumes with small archived image files where this could really save us a lot of space.

Since compaction takes place at the aggregate level, how does this work with Snapmirror? Is this going to mirror logical data to the target where it will need to be compacted while injested? If a source volume has compaction enabled, and the target aggregate also has compaction enabled, does snapmirror enable compaction on the target volume? Or if the source volume does not have compaction enabled, but the target aggregate and volume does, will the target get compaction disabled since the source volume is disabled?

-Rowl

asulliva · ‎2017-04-24

Hello @Rowl,

Compaction will not ingest existing data and rewrite it to save space. It will only work with writes which happen after the feature is enabled. That being said, ahem, *cough*.

Andrew

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Rowl · ‎2017-04-25

The command "volume efficiency start -vserver -volume -scan-old-data true -compaction true" appears to work for existing data per the link you shared. It just seems the time it will take to work makes it impractical to try.

colin_graham · ‎2017-04-25

@Rowl wrote:
The command "volume efficiency start -vserver -volume -scan-old-data true -compaction true" appears to work for existing data per the link you shared. It just seems the time it will take to work makes it impractical to try.

It is worth noting that when the compaction scanner is running, it shows progress in the "wafl scan status" output of the nodeshell (advanced priv mode):

FAS8040*> wafl scan status
Volume TEST_VSC_PROV:
Scan id Type of scan progress
872 compact block 727592 of 29127120 (2%)

Rowl · ‎2017-04-25

@colin_graham wrote:

It is worth noting that when the compaction scanner is running, it shows progress in the "wafl scan status" output of the nodeshell (advanced priv mode):

FAS8040*> wafl scan status
Volume TEST_VSC_PROV:
Scan id Type of scan progress
872 compact block 727592 of 29127120 (2%)

Thank you, that is most useful. I was wondering of there was a way to see progress at the volume level.

Volume mordor_prod:
Scan id Type of scan progress
331312 compact block 425629945 of 2164155552 (20%)

Inline Compaction and Existing Data Timing

Introducing GenAI Search on NSS