2010-11-24 11:33 AM
I am a newbie with NetApp and I am looking for a disaster recovery solution.
We have a DR site already connected using Cisco antennas.
The site is just a few miles away, and I am looking at Metrocluster and SyncMirror. We have some critical data (hours downtime maximum) as well as important data (1 week max).
Which one would be the best to use?
Would that work with our current infrastructure? (just regular Cisco switch with Cat5 cables, Cisco Aironet)
2010-11-25 12:09 AM
MetroCluster requires direct SAN connectivity between sites so it is probably out of question here. SyncMirror by itself is not disaster recovery solution (only if used as part of MetroCluster) and have the same limitation – direct SAN connectivity.
You could probably look at SnapMirror; depending on your LAN bandwidth and latency you could try using sync or semi-sync mode which will give you close to zero RPO. As you are using VMware, you could consider integrating it with SRM for failover automation.
2010-11-25 07:26 AM
Thank you for your reply!
I am looking at SnapMirror and it looks really good to me.
With the Asynchronous replication, do you think it is possible to replicate different data at different time? (critical data: hourly and important data: daily).
Or, can we just mixed Synchornous for critical data and asynchronous for less important data?
2010-11-25 09:27 AM
My two cents:
Do not confuse RPO (how much data you may lose) with RTO (how much time you need to restore services after the disaster).
The former can be fairly easily defined (& controlled) by replication schedule, whilst the latter may vary, to say the least. MetroCluster cannot be beaten here, as it actually stretches NetApp / VMware HA cluster, so we are talking minutes. SRM on the other hand may take longer:
- SnapMirror replication must be broken, target volumes made read/write accessible & mounted to ESX hosts
- remaining VMs at primary switched
- required VMs at DR (all, or just a subset) started in a predefined order
- DNS servers updated with new IP addresses (unless VLANs span both sites)
I tend to set expectations for SRM fail-over time for 2 hours (yes, a bit exaggerated, but takes into account someone's fat fingers, starting services gradually, etc.)