Community

Subscribe
Highlighted
Accepted Solution

Flash Accel question

Now that version 1.2 is out with support for vSphere 5.1 and VMotion, I'm preparing to deploy it, but before I do, there's one thing that I haven't been able to find an answer to in documentation - how does it handle cache device failures? That is, if I give it just one SSD (rather than RAID1 or RAID10), and that SSD fails, will I simply get performance degradation, or will my cached VMs (or worse, entire host, including non-cached VMs) crash? Same goes for PCIe devices, which can't be RAIDed in the first place.

Re: Flash Accel question

Howdy,

I cant see the benefit of raid1/10 for a read cache, however if the SSD / PCIe was to die your data will still safe, but now your cache will come from flashcache or failing that its back to spindles.

C

Re: Flash Accel question

I know the reads will come from spindles, my question is, how gracefully is the failure itself handled, when the entire cache device disappears from host, or starts throwing weird errors? Will the VMs bluescreen? Will the host crash? Will either the VMs or the host require a reboot? Can Flash Accel trigger an automatic VMotion to hosts where the cache is still alive? When I replace the faulty device, will I need a reboot to get the cache back online?

Re: Flash Accel question

Thats a good question..

I just kicked off a SQLIO on a test VM with flash accel cache enabled and after a few minutes removed the datastore that holds our RDMp's for Flash Accel on the host where this VM lives to simulate the loss of a cache device...

VM stayed alive running the SQLIO test and flashcache start to kick in... VM was fine

Flash Accel hit Ratio %

Flash Cache:

When i look at the Flash Accel homepage it now also says:

To repair this, i migrated the VM to another host that had access to the existing RDMp datastore, disabled the cache for the VM, enabled the cache, and now its working again... no restart required!

If you ask me... thats pretty cool!

hope that helps you

C

Re: Flash Accel question

Thank you very much, this helps indeed - now I know I can safely deploy single SSDs, or RAID0 if I need more capacity.

Re: Flash Accel question

Hi everyone,

I'm also doing tests with FlashAccel and followed your example with the RDMp datastore fail test. When i offline the LUN/datastore my test VM shuts down - vmware HA kicks in and tries to restart the vm on another host.

Re: Flash Accel question

Interesting...

My environment:

  • Cisco UCS
    • B200M3 Blade, 256G of Ram, LSI 400GB SLC WarpDrive
    • ESXi 5.0 - current patchset
      • Windows 2008R2 - all current patches including aditional patches required for MPIO, snapdrive, etc.

  • NetApp 3240AE
    • Clustered-Ontap 8.1.3
    • FlashCache
    • SAS

I presented the datastore to each ESXi host via iscsi and I did my test against an iSCSI LUN presented within a windows host configured with clustered file services (this was a test HA SQL environment)

I'm not in a position to do any re-testing against the operating system drive hosted on a VMDK - which may be the difference here ?

Cheers,

Chris

Message was edited by: Chris Anders Added LSI Card to B200M3 spec.

Re: Flash Accel question

Wait one - I was under the impression that current version of Flash Accel doesn't support MSCS. I have a few environments similar to what you tested (Windows Server 2008 R2 on top of vSphere with in-guest iSCSI LUNs used for SQL Server 2008 R2 on MSCS) which could benefit from Flash Accel (the filers are FAS2040/2220/2240, so no option of FlashCache), but when I asked whether or not MSCS is supported in a recent NetApp/LSI webcast about Flash Accel, I was told that it's not supported in 1.2, and may be added in 1.3. Was that incorrect?

Re: Flash Accel question

Interesting...

so from the flash accel gui i was able to see on both hosts the mapped lun's however only one of the hosts had the luns mounted and was writing to it.

Active Node:

Passive Node:

10G of cache was given to both hosts and migration was enabled, which meant i burnt 20G of cache on both blades. - i had each SQL host on separate blades.

I did some simple testing whereby i ran some IO and watch the cache do its job, i then failed over to the other node, re ran some tests and watch the second cache do its job.

(the screenshots above dont represent that test - just pulled them now and the server has since been restarted)

Cache was cold as i migrated between SQL hosts, but that was to be expected and to be honest i didnt even check if this configuration was supported, i just tested it since 1.2 supported iscsi within the host and to my surprise it did the job!

*shrug*

Im not saying its "supported" but it certainly passed the - wow this is cool... lets try this in UAT!

Cheers,

Chris

Re: Flash Accel question

My environment looks like this:

Dell R720

  •      ESXi 5.1 latest patches
    • Fusionio iodrive2 - 750GB - scsi driver for esx 5.1 latest

Netapp 2240AA

  • NFS exports for vmware datastores

Test Machine

  •       Windows 2008 R2 x64 - no MPIO or any special apps installed. Just IOmeter for testing.

Only one ESXi host is involved in the test but is part of a cluster configured with HA and DRS.

To make FlashAccel work i presented a iscsi LUN to the ESXi to store the pRDM file. All other vm disks are on NFS datastore.

All working ok until I offline the lun presented with iscsi. At that moment the ESXi throws an error that it cannot find the raw disk, and shuts down the vm to restart it on another host. No other host has (for the moment) the iscsi datastore so it remains powered off.

I think it's expected behavior from ESX HA to try and restart the VM to another host in the cluster when it looses connection to the LUN but that is not the way I'd wish it should react.

Anyway loosing the iscsi datastore is not a viable scenario as the netapp is AA so no problems here to make iscsi redundant. I will do some more test but this time will actually fail the fusionio card to see the result.