2011-03-02 01:26 AM
I’m looking for best practices for Metro Cluster. I found some information on the setup of the fibre channel network, but something about TCP/IP networking. If I missed something please direct me to the documentation.
Otherwise, here’s my question:
We have a Metro Cluster distributed over 2 sites which are about 10km apart. Between the 2 sites there are 2 links. On every link there is a 10GE and a 8GBit/s FC connection. The FC connection is dedicated to the cluster, the 10GE connection is shared for all sorts of traffic. The Netapp hosts different kind of data (Windows desktop clients, ESX, Databases), so we create a trunk and forward the different VLANs to the NetApp. Each node has a dual-port 10GE card. And now the question:
How do I connect the 2 ports to the network?
a) I connect both ports to 1 switch and create a LACP link
b) I connect each port to a different switch and create a failover configuration
LACP brings more performance and failover bring more availability.
Because there are 2 clustered nodes, LACP should be ok. In case of a network failure, the cluster will failover and all the services will be available again. But a failover might affect the clients, so to have local redundancy will safe us from a failover. But if the PCI card fails, then the node is unavailable again, so I need a second 10GE card to make sure that all paths are redundant. What about SFPs ? Are they stable, or do they fail as often as disks ? What about layer 8 problems ? Are there any technical measures against that ?
So, how are other companies doing it ?
Thank you very much for your feedback. All input is appreciated.
2011-03-03 07:39 AM
to broadly answer some of your metrocluster questions :
- please take a look at NetApp best practices TR-3548 http://www.netapp.com/us/library/technical-reports/tr-3548.html
- please use the Active-Active Config guide and the FabricMetroCluster Brocade Switch Config guide for configuration and installation information as much as possible. Both can be found on the NOW site
- for a business overview of metrocluster and how it compares to other DR solutions at NetApp see http://communities.netapp.com/docs/DOC-9648
- we make no metrocluster recommendations from a host side. As far as metrocluster is concerned, it doesn't care about what data is residing on the volumes or LUNs or the connectivity of those volumes and LUNs to the front end network.
Whether you want to setup your front end network for better performace or higher availibilty would really depend on your customer's requirements for their front end network and does not directly influence the MetroCluster
In a MetroCluster config, failover occurs when components of the MetroCluster fail - like the controller, or storage, or the back-end fabric (for Fabric MetroCluster), or shelf connectivity (controller to self), or if a site disaster occurs etc
Failover does not occur if the front end network components fail
Hope this helps
2011-03-03 08:58 AM
thank you very much for your feedback.
For clarification, from the following document (http://now.netapp.com/NOW/knowledge/docs/ontap/rel734/html/ontap/aaconfig/GUID-A373C1BD-CC36-4532-AEA7-626F4F36714A.html) I had the impression that the Metro Cluster does failover when the frontend fails. Is my impression wrong?
2011-03-03 11:20 AM
That is correct, failover would occur if you configure it that way, but the default setting will not cause it to failover
So if I remember right the default setting is :
2011-03-04 12:39 AM
I hope I’m not the only Metro Cluster customer. So other companies had to take the same decision and for one reason or another they decided for a specific solution.
So, please tell me what you were thinking and what your decision was.
As Tim said, there is no right or wrong.
Thank you very much for your feedback.
2011-03-10 01:31 PM
Hi Bernhard, you are certainly not the only Metrocluster customers. There are thousands of such clusters running world-wide.
You don't mention the controller model you are using. From my expirience a single VIF gives you more availabiltiy and more stabiltiy as there are no special switch configs necessary (and to be honest, there we see a lot of issues). On top of that, depending on your hosts and network LACP might not offer you the expected load-balancing. Also, in most cases one 10Gbit pipe is enough in terms of performance. The controller failover on front-end network failure is working quite stable. However, I usually don't recommend it to a customer. Most of them don't wanna failover a storage cluster "just" because of an broken network link. This should be handled by the redundancy on the network side (e.g. single VIF).
2011-08-25 06:41 AM
Regarding metrocluster, i have read that you need exactly the same hardware configuration for both sites, e.g. 3040-3040 and 20 disk shelves 10 each site.
The question is, if i have 10 shelves on the primary site, but the volumes that i need to synch via metrocluster are located only within 5 shelves can i specify which volumes to be sync via metrocluster to the DR site and buy only 5 shelves or i need to buy another 10 shelves for the DR site?
2011-08-25 09:13 AM
You have to mirror root aggregate, otherwise disaster recovery failover won't be possible at all. Besides that, it is more or less up to you - as long as configuration is still supported by NetApp.