synchronize versus delete and rebuild


I have a picture folder with 40'000 pics, around 20 GB size.

Twice a day I need to create a partial copy of the folder (about 5'000 pics, 2.5 GB, same Netapp system) which itself is used to syncronize a number of external stores. Each creation of the partial copy has probably > 80% the same files, but will have some new and some "not used any more" pics in it.

Thought about 2 path's

a) Include logic into my copy procedure to remove and add files and leave the unchanged ones in place

b) delete and rebuild the partial folder each time

My assumptions: A copy will not create a physical copy but just links so not too much costs for delete and rebuild?

The syncronizer version will be more difficult to build and might lead to "left overs" which nowbody will remove.

The problem I see with solution b): how can I prevent to end up with an incomplete or empty partial folder when my copy gets interrupted?



Re: synchronize versus delete and rebuild

Maybe I'm misunderstanding what you're trying to do - but can you create periodic snapshots on the schedule you're interested in, and then mount the .snapshot directory elsewhere and do your copies from there? Won't that achieve what you're looking to do?

eg, snap sched <your volume> 0 0 2@8,20

mount /vol/<your volume>/.snapshot /mnt/mymountpoint

man snap will give more info.

Re: synchronize versus delete and rebuild


I can't think of a way a snapshot can help. Every time I run the extract the list of files can change and determining which files I need to extract is a rather lengthy query with many conditions.

I thought about my issue a bit over the weekend and think I could do something like:

Define a temp target folder

Check for target folder existence

If existing clear content

Copy the files into the temp folder

If all went well rename the main target folder

rename the temp target folder to be the main target folder

If anything goes wrong try to make the previoius main target folder the new target folder

This should give me either the newest or the best fitting target folder even if something breaks. Ending up with an empty target folder is the worst scenario for me.

Thanks for trying to help.

P.S. If anyone could answer my question about I/O effort needed to create file copies instead of trying to sync only part of the files - this would be welcome.