2011-06-23 03:32 AM
I'm thinking of implementing de-dupe on our existing CIF shares, I've had a couple of runs at this on our DR filer with the mirrored then FlexCloned volumes that I'm planning on doing. From what I can work out because it's a volume with non-deduplicated data already on it, it would seem that you have to run the 'sis start -s vol_name' prior to the duplication schedule running, which is fine but is there a way that this can also be scheduled?
Basically I enabled sis on the volume, set the schedule by running the 'sis config -s day@hr vol_name'. But I'm worried that if I run the 'sis start -s' on the LIVE filer (I've run it on the DR okay to watch the process it takes) then it will have an impact on the production CIF shares... as doesn't this do a scan and then run the de duplication process?
Is there a command that I can schedule the 'sis start -s' to run as well?
Hope this makes sense.
2011-07-20 07:28 AM
If I understand your question correctly I think what you should focus on is that you schedule your deduplication jobs to run at times of day when the performance of your production volume will not be impacted. In practice there is very little to worry about, but it is not a bad idea to schedule them to run at off hours. So essentially the scheduling of sis can happen, then you give it the green flag with the sis on <vol_name> command (instead of sis start - which will run the dedup operation). You can run the sis start -s during off hours to ensure that all of the data is covered. Hope this helps.
2011-07-20 07:39 AM
You can connect to the system during the time where less load is happening, in order to dedupe the existing data within the volume (where you turned on sis).
If you want to start it at a time when you are not onsite, you could use the "job scheduling" (aka AT) on one of the windows systems to kick off a script (batch or powershell) which does kick off the sis start -s for you. If that is what you ment.
2011-07-20 12:02 PM
Dedup is a low priority job. Running dedupe on one volume will not really impact performance on your live filer.
But if you want to be 100% safe. Just do as Peter says.
Question: How do I know if my system can tolerate the performance overhead of dedupe?
Answer: This one is a little more difficult to answer, since we don't know what other processing your system is doing while its deduplicating, and how critical this processing is. As a general rule, deduplication runs as a low priority background process and should not place significant load on the system. However, if this is a concen, we recommend a phased deduplication approach. Start by implementing dedupe a single volume or LUN, and observe system behavior. Repeat this step on other volumes and LUNs and observe the results, remembering that you can stop or undo the deduplication process at any time.
Source : Dr Dedupe
2011-08-09 01:21 AM
All this has been useful, thank you. The problem was our company is 24/7 and quiet times are early hours in the morning and with the volumes already containing a lot of data prior to the de-duplication then I was just looking for the simplest way to set it off. With all the testing I'd done on our flexclones I didn't think there was possible way of setting a schedule within the filer itself I just needed the confirmation. I did it the manual way of logging in out of hours as I'd jumped in there before I got to read your reply Peter - the windows script option would have been a good one to try.
Thanks again, dedupe saved around 16% on each cif share which was well worth the early get up.