Storage for VDI

We are currently designing a VDI solution to support a maximum of 300 concurrent users.  We will be using dedicated VMs, for most users.  Maybe 20% linked clones.

We are trying to decide between 3220s with 48 SAS spindles and 512GB Flash Cache or 2240s with 48 SAS spindles and flash pools.

Anyone have any experience with flash pools and VDI?  We have been using flash cache on our 3240s and they work very well.  My concern is that VDI is 80/20 Write/Read and flash cache does not help with writes (except to free the spindles by offloading reads).


Re: Storage for VDI

Go with the bigger controller and flash pools, if you can.  It's easy to get CPU bound if you are going to do much of anything these days (compression of your CIFS data, for example).

I don't have the max limits for each controller in my head, but I think the 2240 hits limits pretty quick wrt number of SSDs.


Re: Storage for VDI

VDI is much worse than 80/20 write/read, it's more like 95/5 write read. So the issue is not reads, it's writes and for that I would recommend the Flash Pool solution as that will provide a much better write performance. 

I don't see SSD disk listed in your BOM. That is a needed component to be able to run Flash Pools.

Then there is the issue of controller resources and the 2240 comes out the poorer on this. I would not recommend any VDI solution on a branch office SAN solution, which is where the FAS-2240 is placed.

A better config would be the 3220 (3240 would be an even better choice) with 48x600GB SAS 15k RPM (If these are even still available) and 5x200 GB SSD at a minimum for Flash Pool use.

Re: Storage for VDI

Data ONTAP and WAFL are already optimized for random writes with all writes acknowledged to NVRAM, parity calculated in-memory, and then the random IO coalesced and laid down sequentially to disk during a CP operation. The FAS3220 is actually faster than the FAS3240 for most workloads due to the increased main system memory. I would strongly suggest looking to a Clustered Data ONTAP solution using the FAS3220. The flash (whether it is FlashPool or FlashCache) will most likely only offset boot/login storms, and do little to accelerate VDI steady-state. Make certain to work with your local NetApp SE to get a free VDI assessment done using a VDI performance tool like Stratusphere. Also make note that using hardware based clones (FlexClone) as opposed to linked clones will greatly reduce the strain on the storage subsystem.

Re: Storage for VDI

Hi Erick. Interesting post. So, you're saying that utilizing Flash Pools (instead of Flash Cache) for VDI is probably not a good way to go? Even though much time is talked about VDI boot/login storms and the strain they have on VDI, steady state is more write IO intensive. I know that WAFL does  great job at write optimization, but VDI tasks such as recompose, refresh and rebalance are much more write intensive as well. Are you saying that there will be little to no benefit hosting linked clones on Flash Pools and allowing better write IO optimization for these tasks?

I would rather include FlashPools over FlashCache since VDI is more than just being able to optimize reads. If WAFL is sufficient for writes, why does NetApp even have FlashPools? Why not just stick with FlashCache and WAFL? I'm not trying to be difficult, just would like to know the correct approach when sizing NetApp storage and VDI linked clones. Do we go FlashCache or FlashPools? A lot of us understand the differences, but NetApp hasn't been clear as to which technology to use for linked clones (or hardware based clones (VCAI)).

Lastly, I've heard that linked clones introduce a 3X penalty when sizing for IOPS over hardware based clones like FlexClone. Do you know of a TR that specifically explains this?


Re: Storage for VDI

FlashPool or FlashCache are both great fits for VDI. FlashPool does have the advantage of caching random overwrites, and it has cache persistence during failover events. My personal preference is almost always FlashPool. You are correct that tasks like recompose are very IO heavy, but unfortunately without knowing the workload breakdown of a desktop composition I cannot tell you if that particular task would benefit from SSD overwrite caching. I suspect you would see some benefit, but how much is hard to say.

So the big question is how do you speed up writes when they are already being acknowledged at SSD speeds in the form of NVRAM? If all we are doing is substituting NVRAM for SSD then we haven't really accomplished much, and all that would happen is a deferred data movement to disk. You still need to have the disk subsystem in place to handle the de-stage from flash to hard drive, and that is something we do already with ONTAP and WAFL. So the IO we go after on FlashPool are the expensive operations like random reads and random overwrites. We expect random writes to be fast on our systems, and they are, but when a newly written block is actively being manipulated we want to do that in flash. FlashPools have that capability.

The 3X IOP penalty is still there for linked clones due to the alignment offset. If you use our sizers it is striking just how much more SSD and disk you need when large systems are deployed using linked clones. Now, I know in a future release of View there is a new disk type that should correct the IO penalty that linked clones introduced. As of this writing I don't believe any vendor has officially supported VCAI integration, but I know we are going through the process now with VMware. Once that is done you will always want to use hardware based cloning (VCAI) in order to free up resources. I hope this helps.


Re: Storage for VDI

Thanks for the quick response.

What would be an example of a random overwrite as opposed to a random write? I agree that WAFL does a great job of coelscing writes and striping them to disk. Are you saying that random overwrites are not handles in the same way as regular writes - meaning they are not handled in NVRAM and destaged to disk (like regular writes)? So, without the benefit of a FlashPool, how would a random overwrite be normally handled?

From what I read yesterday the 3X IOP penalty is there because for every write, the VMkernel has to read metadata from disk, potentially write metadata to disk and finally write the actual block of data to disk. Alignment offset is something different and causes performance issues whether we're dealing with linked clones or any other data that isn't properly aligned. This really isn't an issue so much anymore if you're running a Windows filesystem > XP.

It's more that VMware hasn't officially supported VCAI in a production environment at this point, then any vendor supporting it. I think VCAI is great because it addresses both the management and operational issues of deploying clones into a View environment. VSC is great, but I don't like the fact that View doesn't have complete control over those clones.

Hopefully, NetApp addresses how FlashPools can benefit a View environment more clearly in an upcoming TR. The NetApp VDI sizer is ok, but really doesn't go far enough to properly address how FlashPools should be sized for the VDI worlload.

Re: Storage for VDI

So there was an issue with VMware View where even if your base image is aligned, when deploying linked clones the clones themselves were mis-aligned. The SE Sparse disk format introduced in View 5.2 along with the new grain size will help alleviate that. Maybe I phrased it wrong when I said no vendor has officially supported VCAI integration, that was in regards to VMware support, and you are correct, they have not certified anyones implementation yet. I think you will see a TR and sizer enhancements soon. As for now the best thing to do is "tweak" the config with FlashPool, without, and with FlashCache to see the differences.

Re: Storage for VDI

Thanks. Any comment on my questions around random overwrite vs normal write and how each is handled differently?

Re: Storage for VDI

Without FlashPool they are handled from disk. With FlashPool the hot blocks will be written and modified on SSD. As far as the incoming write, everything gets written to main system memory, journaled to NVRAM, and then flushed to disk during a CP. In the case of an overwrite that CP operation would hit SSD if that block was part of a FlashPool.