2015-05-24 03:49 PM
Due to technical reasons in other parts of our company we are ending up with "orphaned" files on an Infinite Volume. To be able to find these files and archive or purge them, I have to build a list of files on a regular basis. We have a MD5 tree (/[0-9a-f]/[0-9a-f]/[0-9a-f]) and each directory on the bottom has over 50,000 files at this point. On a cold cache our 2-node 8040 cluster takes on a good time around 30 seconds minimum to read the directory (ls), but I have seen as bad as 5 minutes. So even at 30 seconds per directory it takes 34 hours to build a list in a best case scenario, usual it would take multiple days.
A perl script which runs 16 threads and is doing ls on a single directory still would take a large amount of time.
Can anyone think of a better way to extract a list of file names? We don't need any other attributes such as size, last modified etc.