ONTAP Discussions

Exchange 2007 Local and DR fail overs

KillerKhan
6,138 Views

Which Netapp product(s) I can use to make Exchange Highly available locally and also failover to remote DR site in case of disaster.

I am assuming local failover will involve Exchange 2007 SP 1 CCR. The remote DR failover seems a little cloudy, use SCR and use microsoft's log shipping or use single node cluster in DR in standby mode and use snapmirror to replicate the db and logs luns to the remote site and in time of disaster make the remote site luns read/write and point the single node cluster to the db and log luns?

The articles from netapp gives 2 solutions but both of them I think are not practical solutions. Anyone out there using netap with exchange 2007 and dealt with this kind of scenario? if so what setup did you guys go with? either be it all MS solution or all NetApp solution or a mix of both. Thanks in advance.

12 REPLIES 12

fjohn
6,091 Views

Within a given geographical location, you have two options; CCR or SCC. In a DAS environment, you would certainly want an extra copy of the data. In a Netapp SAN environment, you need to ask yourself what value does the extra copy of the data bring, and at what cost.

Across geography, you choices ar SCR or SnapMirror. You've already outlined some of the shortcomings of SCR, namely activation is a manual process. With SnapMirror, you can use the Business Continuity features of SnapManager for Exchange 5.0 to automate the process.

KillerKhan
6,091 Views

John, thank you for your response.

So for local failover, I am set on CCR, extra hardware is not a problem, and CCR is a very straight forward setup with netapp. No Issues there.

The grey area I am looking at is what is required on the DR site if I go the snapmirror route. Do I need to have a Standby CCR there? meaning 2 node cluster, or I can get away with 1 Server like a SCR? back in the days netapp with exchange 2003 would allow you to setup a single node cluster and there were some steps to take to bing the DR server single node cluster up online with the copied DB in DR, just had to create the DR exchange virtual server with same name as the production virtual server and as long as the db and log files path was same all came up fine. But now with 2007 and Hub Transport etc. Not sure what components of Exchange i will need in DR and what licenses from netapp to achieve snapmirror failover in DR? I know I will need an AD/GC/DNS server there in DR, not sure what other exchange 2007 pieces I will need there. Hub? CCR? SCR? Single node cluster?

netapps solution was to boot the hub transport in prodction from SAN and replicate that boot lun to DR and in care of disaster boot a server in DR from the hub boot lun. That would be impossible to do if the server is physical and you want to remotely either test or bring DR online, unless the DR hub is a vmware machine bringing a physical machine up from san lun will require physical presense there to hit the power on switch, not a good design in my opinion.

so just need some of the minor details filled in. Thanks again john for your answer.

KillerKhan
6,091 Views

Anyone?

fjohn
6,091 Views

There are two paths for the DR target of a source CCR Mailbox cluster. The first is to leverage database portability, and requires only one server. The activation is a bit more complex because you have to rehome the mailboxes. The second is a standby cluster, which can be a two node or single node cluster. The activation is simpler with recoverCMS.

If you're using CCR for your Mailboxes, the Hub role will be on a seperate server. You can combine Hub and CAS roles on the same server however. For the Hub/CAS, you are correct that there is no Exchange VSS Writer for the transport database. This is why you could boot from SAN for the Hub and then mirror the boot LUN.

ianaforbes
6,091 Views

Hi. I've got a customer that will be implementing Exchange 2007 on Netapp. They've hired a Microsoft consultant to do the Exchange piece. I initially sold them on snapmirror for DR before the Microsoft consultant had discussions around CCR. I haven't done too much work with Exchange 2007 so I was unaware that SME supported CCR.

CCR looks like it's an HA solution within a datacenter. Yet, this consulatnt expalined it could be a DR solution spanning datacenters. After some investigation I see that infact this is true. As long as the link between datacenters is sufficient (low latency, adeqaute bandwidth) then CCR across datacenters looks like it'll work.

My question is how do CCR and snapmirror work together? From my reading on CCR it looks like it's a log shipping soltuion between the active and passive node. As soon as logs have been comitted on the active node they're shipped to passive.

Snapmirror replicates the databases and logs to the DR site. If CCR ships the logs and snapmirror replicates the logs is there a need to do both? I know both solutions can work together. I'm just confused as to how they integrate.

Cheers

lamarca
6,091 Views

Technically speaking the customer could think there is no need for SnapMirror if they are using CCR. This will come down to looking at the full end to end solution for them. With CCR being free with Exchange 2007 it seems like an easy fit for customers but, what else is the NetApp box supporting and are their performance concerns on the Exchange server along with other things like backup and archiving.

CCR is a true DR only solution, if a corruption comes in then it is also replicated to the DR side (as you stated it all depends on latency and meeting Microsoft Cluster requirements). From a performance perspective, the customer will need to factor that in when purchasing their Server to run the Exchange environment and also have those bandwidth latency concerns (we're just IP replication that the controllers can throttle and have no distance limitations).

On last thing to think about is backup, the database on the DR side is active and you have more things to worry about to spin that off to tape. This is an easy option with the NetApp SnapMirror destination, so if the customer was thinking about backup, you should make sure they are thinking about the full solution. Also, if the customer has an e-mail archiving system, CCR will only replicate the points within Exchange, not the archived e-mail, so if that is being stored on NetApp then SnapMirror will be the way to move that data to the DR side.

Just some food for thought...

Ray LaMarca

Technical Partner Manager

 

Raymond.Lamarca@netapp.com

http://www.netapp.com

W: 847-430-6547

C:847-529-8931

 

Got questions? Get answers in the Partner Network.

http://communities.netapp.com/community/netapp_partners_network

ianaforbes
6,091 Views

Hi Raymind

Thanks for the response. I'm actually more interested in how snapmirror and CCR work together. Let's say that snapmiror wasn't even in this solutin and they were just doing CCR across datacenters. I assume that I would have to procure volumes for the databases and logs at the passive site. This would be the same as snapmirror (identical volumes at source and destination). From what I understand with CCR there is an initial seeding of the database at the passive node. Transaction logs get shipped to the passive node after they have been committed at the active node. By all accounts this is how the two nodes stay in sync (at least up to the lag of the log shipping).

Now, let's take a look at snapmirror. I've procured my storage at the DR site to accept the replicated database and log volumes. After the initial baseline my DR site has been "seeded". Subsequent mirrors send delta blocks across the wire.

So, my question is do the two solutions intefere with each other? If CCR is log shipping at a regualar basis to the same vuolume (lun) as my snapmirror DR volume (lun) won't that cause issues? Does CCR transport the logs over snapmirror replication, or some other mechanism?

If corruption occurs at the source it's going to be replicated to the destination whether i'm using CCR or snapmirror. That's where snapmanager for exchange comes into play. I should have mentioned that SnapManager for Exchange 5.0 is part of this solution for the client. It specifically states in the docs that SnapManager for Exchange, Snapmirror and CCR can all work together. I'm just having problems wrapping my head around how snapmirror and CCR work together. If you could give me a real world example of how they would work together in the case of the primary site becoming a smoking hole in the ground i'd appreciate it.

Cheers

lamarca
6,091 Views

The NetApp comparison for CCR is SnapMirror but in a Synchronous mode (well almost - more on that below), so latency maybe an issue there to get a true apples to apple SnapMirror vs. CCR. The SnapMirror schedule is probably tied to the SME job itself to get consistent database replication, so you have your latest Exchange Database replicated maybe an 15 minutes or an hour behind it all depends on the SME schedule you have running.

Robert Quimby wrote a great Blog on this http://partners.netapp.com/go/techontap/matl/exchange2007.html but here are a few highlights, starting right away with the #1 question in a DR scenario what is the customers Recovery Time Objective and Recovery Point Objective.

Recovery objectives. How much loss of e-mail data is acceptable? Remember that SnapMirror replications are triggered as a result of a SnapManager for Exchange backup, while CCR replication occurs as a result of a filled 1MB log file.

Ability to support the extra I/O from CCR. CCR imposes additional I/O overhead to that of normal Exchange 2007 I/O. Microsoft recommends that CCR clusters use isolated storage from other servers; see Continuous Replication LUNs. In an isolated environment, the additional I/O overhead of the target LUN would keep up with the source and would not require additional disk performance, although at higher latencies. CCR clusters that use shared storage (shared between clusters) require that the storage (both source and target) be over-provisioned to handle 100% more I/O than the source LUN for CCR deployments. For example, if the source LUN requires 1,000 I/Os, in a shared storage environment the source and target LUNs would need to be provisioned with storage to handle 2000 I/Os each.

Distance to the disaster recovery site. Remember that Microsoft Geographically Dispersed Clusters require network latency below 500ms, and Exchange 2007 requires a disk latency below 20ms, which typically restricts distances less than 100 miles between systems in the cluster. If the DR site is more than 100 miles away, then a solution other than or in addition to CCR is required.

Also, we have a best practice guide written around using CCR and NetApp as well: http://www.netapp.com/us/library/technical-reports/tr-3600.html

Please let us know if you need anything else.

Ray LaMarca

Technical Partner Manager

 

Raymond.Lamarca@netapp.com

http://www.netapp.com

W: 847-430-6547

C:847-529-8931

 

Got questions? Get answers in the Partner Network.

http://communities.netapp.com/community/netapp_partners_network

ianaforbes
5,407 Views

Hi Raymond

Once again thank you for the great information. Unfortunately, I'm still looking to get my main concern answered I understand the similarities and differences between CCR and snapmirror. I now especially understand the latency issues of CCR if placing the nodes across datacenters. I fully understand how snapmirror is implemented and it's benefits.

My question is if the decision is made to support BOTH solutions how do they work together? Our plan was to create a CCR cluster across 2 datacenters. So, in the case of a site failure the target datacenter with the passive CCR node would takeover. In this architecture is their even a need for snapmirror?? In such an architecture if snapmirror were to be used, how would it be used?

Robert Quimbey mentions that customers who have combined CCR and snapmirror use a CCR cluster in the local datacenter and then use snapmirror to replicate the databases/logs to the DR datacenter. You acheive both HA in the local datacenter and DR across datacenters. Correct me if I'm wrong but it seems to me that's the only architecture that both CCR and snapmirror work together. The architecture I originally discussed would seem that CCR and snapmirror would be walking over each other. I just don't see how if I had a CCR cluster across datacenters AND snapmirror replicating across datacenters acheives anything. They're both doing the same thing Also, CCR would be log shipping at a shorter RPO then snapmirror.

Let's say I had the CCR cluster nodes across datacenters and snapmirror replicating across datacenters (from datacenter A to B). A failure of the active CCR node occurs but datacenter A is fine. This would trigger a CCR failover to the CCR failover node at datacenter B. As far as the snapmirror replication is concerned it would happily continue to replicate from datacenter A - even though that CCR node is now the passive. Just doesn't make sense to me to have both CCR and snapmirror happening in the configuration I have mentioned.

If any of this seems incorrect to you please let me know. As it stands now, the customer is more interested in an extremely low RPO AND an automated failover of their Exchange solutiuon at the DR site. While I think snapmirror is great, when used with Snapmanager for Exchange the replication can only be asynchronous. Even with rolling snapshots/snapmirrors I don't think we could acheive the same RPO as CCR gives. Plus, snapmirror is application agnostic. So, while the data gets replicated just fine, manual steps must be taken at the DR site to break the mirror and mount the databases. This is all automagic with CCR.

Cheers

lamarca
5,407 Views

Ian,

As you've said in seeing from my e-mails and Robert's Blog Snap Mirror has its place as does CCR. But in the design you are describing CCR and Snap Mirror are basically doing the same thing.

For the automated failover CCR is the solution the customer would want to do in the case to meet their needs. SnapMirror will require an automated script or manual intervention.

So in this case if (and I still would question the latency #'s for CCR to meet the Microsoft requirements) CCR meets the needs for the customer.

Ray LaMarca

Technical Partner Manager

 

Raymond.Lamarca@netapp.com

http://www.netapp.com

W: 847-430-6547

C:847-529-8931

 

Got questions? Get answers in the Partner Network.

http://communities.netapp.com/community/netapp_partners_network

ianaforbes
5,407 Views

Raymond

I really appreciated your feedback. I learned some useful stuff along the way 🙂 I've decided to go with CCR in this situation to satisy the client's concerns. As for the latency issue. They've purchased Riverbed Steelhead appliances:

http://www.riverbed.com/results/solutions/accelerate/microsoft_exchange.php

So, latency won't be an issue.

Cheers

vichaupari
6,091 Views

Dear All,

I would like to know how the Business Continuity features of SnapManager for Exchange 5.0 works? Could anyone please explain to me?

Thanks.

Regards,

Public