Based on your description, you are not suffering from a *storage array* performance issue. You're likely suffering from a well known, but little understood, host-based filesystem performance issue. With all major operating systems, attempting to read large numbers of small files results in a large amount of O/S system overhead. This is because more time is spent at the O/S level performing seek(), open(), and close() operations on each and every file you're processing. While these operations don't necessarily take a lot of time for a single file, they quickly rack-up when processing hundreds or thousands of small files. These problems manifest when performing activities like backups, virus scans, content indexing, etc... With really small files like 4K, you can literally spend more time finding, opening and closing the file than you do reading the data. One customer who had over 1 million small files, we calculated that they were losing multiple *hours* of time due to filesystem processing during backup windows. The fact that this VM resides on top of VMFS probably adds even more latency, but I don't know how much.
Once the O/S finds the file in the file system and then opens the file, that is when it actually gets around to reading the contents of the file and communicating with the storage array. That's why everything looks OK from the storage array performance perspective. The array is responding very quickly to the read() request from the O/S, especially with your high cache hit rate.
I created a document a few years ago which explained this issue. It included a few examples that actually showed the O/S system calls as it processed each small file and how much time each operation took. I'll see if I can dig that up and I'll post it here.
To answer your original question: In these scenarios, faster CPUs and using the lowest latency disk helps, but you can never truly eliminate the issue. Even if you stored all the files on SSD or in a RAM-based disk, the OS still has to perform all of the system calls on each and every file. Some suggestions:
Get the file system that houses the small files off of VMFS and consider using NFS or an RDM LUN directly mapped to the VM's O/S. VMFS undoubtedly adds some amount of latency to each I/O operation, so removing it from the I/O path would help.
See if you can get faster CPUs for the VM. Consider guaranteeing the CPU for the VM. This will help the O/S process the system calls faster.
Allocate more RAM to the VM to increase the size of the O/S buffer cache. This will cache more files in the VM's RAM which has dramatically lower latency than the fastest disk. Consider guaranteeing the RAM for the VM.
Research your particular O/S filesystem. Some file systems have known methods of improving performance with a large numbers of small files. For example, in BSD a good recommendation is a hierarchy of sub-directories with a limited number of files in each (rather than dumping them all in one big directory).