I'm working on a new SnapProtect 10 SP6 setup and battling a lack of information
Here is the scenario: I have an Office and a Datacenter connected with a 100Mbit WAN Link about 500km apart. I am trying to build a protection scenario where a CIFS Volume in the Office gets snapshots hourly / daily / weekly, and a copy is snapvaulted to the datacenter weekly. So far I have that working. Eventually I might add a cut to tape from the datacenter as an additional layer but for the moment it's not in play. The primary SnapProtect server / Media Agent is in the Datacenter.
My question has to do with catalogs. The volume in play has over 7 million files on it. If I configure the jobs to perform cataloging, the indexing process takes Many Many Hours to complete... when the goal is hourly snapshots, this is obviously unworkable.
At first I wondered if I even needed catalogs; but it seems like without them you lose a lot of the search capabilities when it comes to restores. If I know the filename(s) and locations of them I can obviously just mount a snap to restore, but we do frequently get much more fuzzy restore requests.
So I need to figure out a way to improve the performance of cataloging. Here's what I've tried:
Added a Media Agent in the Office, and configured things so the indexing happens there (taking the WAN link out of the equation). This roughly doubled performance, but it's still taking Hours.
Configured jobs so Cataloging happens only on the Weekly snaps, and not for the dailies and hourlies. This at least lets the hourlies complete. This is an ok solution but I suspect I'll only have search capabilities for the weekly backups...which isn't really all that great.
I've read that if you configure the hourly / daily jobs as Incremental and the weeklies as full, that the Cataloging will somehow only do the changed data (and therefore complete much faster). I'm going to try this but have no data yet. I'm completely perplexed on the concept of how a backup can be incremental when a snapshot is always more or less a "full" copy...
I'm wondering if anyone has any tips or best practices or advice with regards to configuring SnapProtect jobs for cataloging. I see that one of the volumes I'm going to eventually need to throw in here has 26million files in it (almost 4 times the one I'm struggling with now!)
Some more info (answering some of my own questions):
I started testing with a smaller volume (~800,000 files) and sat watching the CVNasFileScan.log file while a few things ran.
Ran a "Full" backup with Cataloging on. Seemed to take about 5 minutes to catalog things (seems faster than the other volume but I'll ignore that for now). I watched the index process in the log.
Next ran an "Incremental" backup with Cataloging on. The log noted it was doing a SnapDiff (presumably comparing to the last full). Noted no files changed, indexing said successful, took very little time. Ok, so looks like this does what I thought.
Next ran another "Full", as I presumed this would do another full catalog. I was surprised to see that the indexer did another SnapDiff (comparing to ???) and noted no changed files, said indexing was successful, and it took very little time.
So maybe I'm overthinking this...if I get through one catalog of a volume with a big number of files, presumably each catalog process will just process changed files. That makes me happy but I'm confused still