Let's say I do full backup of a DB in daily basis. Will 2nd full backup data be processed against the deduplication Database which is created by first day full backup?
If yes, the 2nd day data should be reduced a lot. Correct?
My second question:
I read ALT document, and in which it says that we should turn off the compression on the backup software. How do I know or to be sure of that ALTVALUT deduplication will work more efficently?
1. Yes, the second full backup will be processed against the previous full, which should reduce the amount of data written by AltaVault significantly (increasing dedupe rates).
2. Deduplication by AltaVault is class leading, compared to many other deduplication methods used by other vendors. Basically, you'll get better squeeze by using AltaVault deduplication and compression, vs. a dedupe method implemented by backup software. This is discussed in the technology vierview TR, which you can read about here: http://www.netapp.com/us/media/tr-4427.pdf
You of course can test this to make sure this will happen with your workloads (we've had this done in sales cycles confirming the statement above).
Thanks Chris. Two follow-ups:
1. Is Altavault Dedup a global dedup? Which means regardless NFS or CIFS, regardless from which hosts, all incoming data will be deduped against the same dedup database?
2. Where is the dedup database located, in Memory or in local appliance cache?
Q1. Is Altavault Dedup a global dedup? Which means regardless NFS or CIFS, regardless from which hosts, all incoming data will be deduped against the same dedup database?
A1. Yes, dedupe is global regardless of the protocol that sends the data to AltaVault on the front end.
Q2. Where is the dedup database located, in Memory or in local appliance cache?
A2. A portion of the appliance cache is reserved for the dedupe indexes, but this isn't part of the "usable" capacities as reported in the spec sheet materials and other presentations you see for AltaVault. AltaVault does load information into memory for improved lookup, but all data is flushed to cache to ensure no data is lost. The RAID card also has a super capacitor backup to ensure writes are flushed in the event of an outage.
We did some deduplication testing with servers using SQLSafe writing directly to a SMB share on the AltaVault. It was determined that leaving the deduplication option on in SQLSafe actually gave us better results than disabling that option. So in effect SQLSafe deduped the data before sending to the AltaVault, which then deduped it again based on the data still on disk. This worked for us because of a low retention rate - database team only wanted to keep 8 days of SQLSafe backups.
If we extended the retention to 14+ days it was more effecient to turn off the deduplication in SQLSafe and relay only on the AltaVault dedup.
I have a follow-up on Q2/A2.
The deduplication database essentialy contains all unique data/blocks, all repeating data should come and gone as the retention is ening. The Deduplicaiton DB shold be always staying in the lcoal cache as I understand, and will be purged if no data is refereffrencing them. My question is following:
Could it be the Dedup DB is so large and local cache could not contain it that has to be pushed to Clould?
We have 400 applicance, and believe has 80TB local cache. 150TB data in cloud now.
The deduplication DB is stored on disk space of the AltaVault appliance (separate and apart from the 80TB disk space that acts as a data cache), and at any point a subset of the DB is in memory cache for lookup operations. The deduplication DB data is not pushed to cloud, and thus if you suffer the loss of an appliance, the data can still be reconstructed on a new AltaVault applaince but the deduplication DB entries will be limited to whatever data is recovered back from cloud, or created as new backups are generated within that recovered environment.