I have a general question regarding Netapp storage for a VMWare environment.
In short: We would like to virtualisize all our Servers including the ones that have high disk I/O utilization. For this we need a storage solution that basically performs as well ( latency, throughput) as a DAS Raid 10 consisting of 10 SAS drives. What can you recommend? NFS NAS, iSCSI/FC SAN or External DAS shared by multiple ESX Servers via SATA ( like Dell PowerVault)?
Additional information: Our workflow consists for some servers to write between tens of thousands and millions of files, mostly less than 2k in size, in sequence to some folder, reading some of the files again and then calculating new data and writing it again. We had to learn that a simple NFS Share would not perform well enough seemingly because of NFS protocol latency issues. Also it seems that virtualization does not produce a significant overhead. When the data is written to and read from a VMDK that is located on the internal HDDs of the ESX server the performance is alright. So the question is which storage solution, protocol and connection medium ( 1GbE, 10GbE, FC, DAS) is able to replace our internal RAID solution.
Basically, there is probably too little information here to give a definitive result. You haven't really stated what sort of servers you are going to virtualize, either.
I guess I also disagree with the blanket statement that NFS wouldn't perform well enough. See the specfs results for the previous line of NetApp products here: http://www.spec.org/sfs2008/results/sfs2008nfs.html If you can live with 2-3 ms response times (easily comparable to SAN) then it could be a good choice. You can also run your VMWare datastores on NFS. IIRC, iSCSI and FC are also offered as "affordable" bundles.
It all depends on more exact specification of performance characteristics and some idea of backup methodology, etc. A NetApp sales agent/partner will be able to help you with sizing and prices.
We will analyze our set up in more detail with a NetApp representative in the coming weeks I just want to adjust my expectations because I fear that they might be too high.
We tested in the past with an Isilon Cluster that provided a NFS Share either directly to a Server/VM or to the ESX Server which stored the VMDKs in that NFS Share.
We developed an internal benchmark that reproduced some of our workflows consisting of fwrite operations.
It became clear that because our workflow is strictly sequential and working with so many small files, latency is a real issue. It seems at the moment that even 2 - 3 ms on an NFS Share multiply the running time of some programs by a factor of 10 compared to a Server that either has an internal RAID array or a VM that has its VMDK on the internal RAID of the ESX Server.
So basically my question is, I think, is there a Shared Storage Solution that has a latency smaller than 1 ms?
In vSphere the DAS of the ESX Server is shown with a latency of 0,08 ms
and the ISILON NFS Store is shown with a latency of about 4 ms.
I guess that is more a question of how much money you have. Given enough PAM cards in a filer head to deal with most of your working data set, then you should be able to get constant access with very short response times.
I guess I would still collect as many facts as you can and see what NetApp comes up with. NetApp NAS isn't always the clear speed winner, but you have a huge range of other functions/options that make it a better solution for situations where you need more than "dumb, fast disk"
I have given all of this a bit more thought, although it probably isn't directly related to the choice of a storage solution. The reason for this is that something here seems to be standing out on a design level. I don't really have any idea what sort of application or data that your are working with (just the sizes and some basic structure), but basically, I think a I/O based application that has these sorts of I/O response time requirements is going to run into problems eventually because of physical constraints of using disk-based storage.
I can't really comment on what your benchmark results mean, but I have a suspicion that something in the DAS test is lying. There is no disk-based storage alive that can achieve 0.08 ms, except for perhaps SSD. Basically, the filesystem cache or the underlying FS is telling you stories about the data actually being writing to disk. Perphaps the DAS NVRAM is big enough to handle the benchmark I/O and that is where the data is, but that doesn't scale terribly well for larger operaitons. VMFS is not known as a terribly fast FS either.
So basically, because the hardware of today's world probably is going to disappoint expectations of < 1 ms, there are a few widely-used options to work around this:
1. For highly parallelized, real-time analysis, things like global file systems and NUMA clusters are the key. They cost tons of money and are incredibly complex because of the high level of message passing and locking involved. They also require complex software.
2. Databases. A releational database basically provides all of the buffering and indexing included to assure low response times and scalability. The write locking and buffering are already in the DB so you don't have to design your own. Several of them work well with NFS as well (Oracle, Postgres, Sybase to some extent, etc). Transaction logging also gives you a consistent picture of where you were when things went wrong. Clustering of databases (active-active) can also be found, but is also complex and doesn't scale linearly.
3. Batch I/O. Basically, I/O daemons and message passing to achieve sort of the same structure as a relational database, but some additional thought should be give to having optimized directory structures to decrease lookup/search times. If you have a somewhat predictable processing operation, I/O daemons can prefetch and flush I/O without stopping processing and allow the worker daemons access to data sets in memory, which would meet your fast access requirements.
The second two options (and partially the first) don't really include data protection in the form of hardware redundancy or FS protection necessarily, and backup is a whole different issue. Depending on how disposable (or conversely, valuable) your data is, preventing its loss is an element that needs to be considered both technically and financially.
I've worked at the edge of HPC setups (ca. 800 machines) for seismic data processing and one of the "low-tech" ways of getting the job done without the complexity of NUMA systems is batch I/O. It limits cost as well because you don't need to scale for hundreds or thousands machines beating up the storage all of the time. It's not as "cool" as having all of that infiniband equipment to play with, but it doesn't break as badly either.
Anyway, all of this might be totally useless rambling, but I just had the feeling that something about the access time requirements just didn't feel right and that there might be a way to sensibly work around it. A method that sort of embraces the KISS principle and could be financially more sound at the same time. I hope I haven't wasted your time.