VMware Solutions Discussions

fc + fabricache vs nfs

numbernine
3,924 Views

We have a 2040 HA pair which is now at max spindles (all 15K) to still allow for failover.

We have 4 esxi hosts with approx 120 VM's with a good mix of DB's, App servers, web servers and general services.

We have an NFS datastore per aggr, my aim this year is to segregate datastores from other forms of storage

We do have a second single controller 2040 in a DR/backup server room next door with larger sata disks.

We have fc set up - we had an RDM for a (now defunct) sql cluster. It's not being used right now though

The challenge for this year is to provide a plan for next year for storage expansion and performance improvements

I have been thinking about swapping a larger slower shelf from next door with a small faster shelf here - that would give us capacity - yet to find out if we're 'allowed' to do this by NetApp or if it meets our sizing requirements for next year - early days

on the performance side, I have recently seen qlogic have released 'vmware aware' SSD caching FC san adaptors (Fabricache). Now this looks interesting as it caches locally on the host and retains the cache after a vmotion - all the cache rebuilding traffic happens over the fc side.

With these adpators and the shelf swap we could also keep 100% NetApp for our storage (which makes things easier to manage)

Alternatives are to start looking at products from vendors like Tintri or something like that (which answer the performance AND capacity questions - but it's then another management layer...)

Thoughts? Anyone looked into the qlogic cards?

Any gotchas for swapping NFS for FC?

Andy

7 REPLIES 7

aborzenkov
3,924 Views

NetApp offers host SSD cache adapters too.

numbernine
3,924 Views

For a 2040?

radek_kubka
3,924 Views

I think we are talking about Fusion-IO cards (resold by NetApp), so the storage back-end is irrelevant in this case (unless we are talking about yet-to-be-developed integration between host-side cache and a filer).

Fusion-IO is fundamentally different from Qlogic Fabricache - the former is a PCI card with a lot of cache ("pretending" to be main memory with the right driver), the latter caches FC traffic on the HBA.

Regards,
Radek

numbernine
3,924 Views

OK, I didn't know NetApp resold Fusion IO, therefore I assumed we were talking about flashcache (which isn't available for our filer of course)

Not mad keen on Fusion IO as there are some restrictions on vmotion/HA/DRS etc. We're not doing VDi where it seems more appropriate. In other forums F IO have talked about 5.1 supporting vmotion across nonshared storage (which would work for planned downtime) not too good for us. There are also options with 'helper VMs' or guest side drivers.

We'd like to avoid the complexity.

These qlogic units require only a normal HBA driver, no guest side drivers, don't restrict any of the native functionality of vmware, seem not to impact on availability and no virtual appliances are required.

AND it looks like you can do a vmotion whilst retaining the cache.

I guess as they are so new they haven't made it into anyone's test rigs yet. I'm not sure why the community is so quiet about them...

aborzenkov
3,924 Views

These qlogic units require only a normal HBA driver, no guest side drivers, don't restrict any of the native functionality of vmware, seem not to impact on availability and no virtual appliances are required. AND it looks like you can do a vmotion whilst retaining the cache.

You still believe in silver bullets?

Quoting data sheet:

Unlike standard Fibre Channel adapters, the QLogic FabricCache Adapters communicate with one another using the Fibre Channel infrastructure

This

  • does not work in case of direct storage connection
  • directly violates primary fabric zoning rule established by industry so far - single initiator zoning

Clustering creates a logical group of adapters that provides a single point‑of‑management, which cooperates to maintain cache coherence and high availability, and while optimally allocating cache resources.

So we have coherency protocol between adapters which is unknown so far. What latency does it add?

Idea sounds interesting, but I would wait for any real life usage reports.

numbernine
3,924 Views

Thanks for the feedback, that's really interesting

You still believe in silver bullets?

My inner child is still waiting...

Not having spent much time with FC I didn't know about the recomendation for single initiator zoning.

I'm going to do some research about this, perhaps you could help: Why is it best practice to do single initiator zoning?

I can't find where I referenced this but qlogic gumph states somewhere that there's a 15-20% hit on cache performance when accessing the cache from a different host via fc. It would be nice to think that we're still talking about an order of magnitude faster than disk...

Idea sounds interesting, but I would wait for any real life usage reports.

That's kinda why I'm here, to see if anyone has used one yet

aborzenkov
3,924 Views

I guess, the primary reason is - you should not define more than is required for your communication. Normally there is no need for HBA to talk to HBA, so we never define zone that includes two HBA. But if vendor says it is OK - fine. Also consider security PoV - this is yet another channel which can be used to get unauthorized access, yet another thing to manage.

But I have met enough customers who just plugged in all devices in virgin FC switch and it worked. I also have seen horror stories about fabric reconfiguration causing major downtime. So I tend to define only what I need just to be on safe side.

As for cache consistency - I’m primarily concerned with overhead of cache coherency for every request. So it looks as promising idea for local cache, but I do not hold my breath for distributed cache implementation.

Public