2012-09-06 03:29 AM
Hi. I have a question about performance accessing a large number of small files on NetApp storage.
I'm considering the best way to store circa 3 million files, most of which will be just over 1k, that are accessed by virtual mahines. This will be a low-write high-read dataset. The VMs are on VMWare ESX datastores that are on NetApp storage and accessed over NFS. My alternatives are to either store all the small files in a VMDK (on a filesystem with 1k block size), or put them on a dedicated NetApp volume accessed directly by the VM over NFS.
With WAFL using 4k blocks, there will be some overhead on both storage size and read speed if accessed directly, but presumably there'll be the same overhead if they're in a VMDK. Are there any NetApp recommendations for how to best handle this type of dataset?
Thanks in advance for any advice ....
2012-09-06 06:17 AM
Hi, thanks for responding. The existing dataset is flat, but there is a natural split of the data into 10 "layers" (equally-sized, I'm told), so yes this could be reduced to e.g 10 x 300,000-files instead of 1 x 3 million files if I could justify it in performance terms.
2012-09-06 06:37 AM
Are humans accessing the files, or is a software program accessing them? It sounds like you have some discretion. If it is a program, you may want to have more than 10 files – you may want to have a matrix of directories (i.e directories AA through 00 = 1296 folders, so for 3million files , 2314 files per), if your program can look up the location in an index.
I am not a high file count expert – I just have some familiarity with the problem domain from previous lifetimes where NTFS was my file system.
NetApp has a TR-3537 – Best Practices for High File Count environments , but it appears to be a NDA only document (not a committee I was on ;-) ). Your NetApp account team should be able to facilitate the steps to get you this document.
2012-09-06 07:26 AM
Thank you Matt, much appreciated. The files will exclusively be accessed by web servers, not humans. I may have some influence on the directory structure, but would prefer to kkep it simple from an application perspective.