2008-10-18 07:32 AM
I have a FAS3020 with 1Tb volume used as storage for an application that generates millions of small files - 30-40k each.
We have been getting an error message saying files are too large which we assume to mean there are too many.
We have increased to maximum number of files allowed as the volume is not full.
1. How does ONTAP calculate the max number of files for a given volume size?
2. How many files can be added per volume above the default setting before you start to see an impact on performance?
3. How does having so many files affect performance when trying to do a backup?
4. Is there a best practice for data set containing millions of small files?
Solved! SEE THE SOLUTION
2008-10-18 12:07 PM
Q. How does ONTAP calculate the max number of files for a given volume size?
A. By default, the volume will get 1 inode per every 32KB of disk space and the # of files that a volume can hold get arrived at using that number. The maximum number of inodes is limited to one inode per one block in the filesystem. (which is 1 inode per every 4KB). It is generally recommended to NOT go that low.
Q. How many files can be added per volume above the default setting before you start to see an impact on performance?
A. I do not know if there has been any study that was done to measure the file system performance w.r.t to the number of files. Also note that the performance of a volume is not only dependent upon the # of files the volume has but also on the structure of the files. For example, you'll get a lot better performance if you distribute your files across multiple directories at the same level, as opposed to having all of them in 1 directory.
Q. How does having so many files affect performance when trying to do a backup?
A. Very negatively. Your backup software will spend a LOT of time getting the file list and then actually doing the backup. Kick off a backup and examine the time difference between the time when it starts reading the file system and the time when it actually starts reading any bytes from the file system... that will tell you the story.
Q. Is there a best practice for data set containing millions of small files?
A. I would try to organize the files in such a way that you have files distrubuted among multiple directories. Also, try not to have too deep of a directory structure.
Some other reading
2008-10-20 01:49 AM
I agree with everything which has been said and would also like to add that our backup application "HPData Protector" has a limit of 20,000,000 files per backup job. We found this out the 'hard way' once we were no longer able to backup the volume.