Tech ONTAP Blogs
Tech ONTAP Blogs
NetApp® SnapMirror® is the technology of choice when it comes to replicating volumes between NetApp ONTAP® based storage systems. And now SnapMirror can be used to replicate volumes between ONTAP and Google Cloud NetApp Volumes.
Seasoned ONTAP admins deploy SnapMirror for primarily two use cases:
For the migration use case, Google Cloud NetApp Volumes already offers the volume migration feature, which uses SnapMirror underneath. NetApp has now added bidirectional SnapMirror support, calling the feature “external replication.” In this blog post, I discuss this feature in detail.
Our modern-technology world can be confusing. I use simple mental models to keep my thoughts sorted. If a model is good, I can derive the properties and behaviors of a system from it, without having to remember all the details. NetApp Volumes uses SnapMirror for a few different features that are somehow related, but that solve different use cases. Let me share my NetApp Volumes mental SnapMirror model.
As my Venn diagram shows, the following three features are related:
All three features overlap considerably in their APIs and in gcloud CLI usage. The UI workflows are optimized to support an individual use case while reusing as much common functionality as possible. And the underlying replication technology is always SnapMirror.
Now that we have established a mental model of the commonalities of the SnapMirror backed features of NetApp Volumes, let’s dive into the details of external replication. (Remember that it’s replication between an ONTAP based volume and NetApp Volumes?) The lifecycle of every replication goes through multiple phases, which I explain next.
Like the volume migration workflow, you first must establish a connection between external ONTAP volumes and NetApp Volumes.
In this phase, administrators of the source ONTAP system need to grant NetApp Volumes permission to fetch volumes from a storage VM (SVM) by setting up cluster and SVM peering.
A baseline transfer creates a NetApp SnapMirror Snapshot™ copy on your source system and replicates all the used data—including all prior Snapshot copies—to the destination volume. Depending on the network speed between the source and the destination and the amount of data, this process can take hours or days. But in the meantime, your source volume is available and can be used to read and write data.
After the baseline transfer is finished, the destination volume becomes accessible as read-only, containing the data of the SnapMirror Snapshot copy.
While the baseline transfer is in progress, a lot of time may pass, and a lot of data may be modified on your source system. But don’t worry, SnapMirror enables incremental transfers. Based on your specified replication schedule, a new SnapMirror Snapshot copy is created on the source system, and the changes between the new and the previous Snapshot copy are calculated. Then only the changed data is transferred during an incremental transfer. Depending on the amount of data that has changed since the baseline Snapshot copy was created, an incremental transfer is typically considerably faster than the baseline transfer.
After an incremental transfer is complete, the destination reflects the data of the last replication Snapshot in the read-only destination volume. The replication process sits idle until the next scheduled replication event triggers and the next incremental transfer starts.
This process goes on until the replication is stopped by the operator.
As mentioned, the destination volume reflects the data of the last successful source Snapshot transfer and makes it available as read-only for clients. All is well.
But let’s say that disaster struck. Your source site is not available anymore, and there’s a high likelihood that it won’t be available in the next few minutes or hours. Production is down, and your company is at risk of going out of business. This is the day that you have been preparing for. Now you can put your carefully crafted DR plans into action.
Your data is already sitting in the destination region, waiting to be used. The first action is to stop the replication at the destination site. This step makes your destination volume read/write and ready to use. You can now start your VMs, containers, and applications on the destination side, using the destination volume as the source of truth.
Depending on how well you prepared your deployment procedures for your workloads, this process can take minutes or hours.
You should also note that external replication is asynchronous. It always lags behind the source volume, so you lose the latest data that was available on the source volume but that hasn't been replicated to the destination yet.
If you can plan for your disaster, like carrying out a DR test to confirm that your procedures work, there’s a better approach. First, stop all workloads on your source volume, perform a manual synchronization operation on the replication, and then stop it and start workloads on the destination volume. With this approach, you know that the latest data is on the destination volume.
You are now running on the destination volume, and your source volume is dormant.
How you clean up after a disaster depends on what kind of disaster it was.
If you just stopped the replication to see whether it works, but production on the source side continued, you can simply resume the replication. The destination volume discards all the changes that you made to it, and the source volume starts incremental transfers again.
If your production moved to the destination side, the data on the destination volume is now more current and the source volume is outdated. You can now reverse the replication direction to make the destination the new source and vice-versa. The replication is now going in the opposite direction.
If you want to reestablish the original direction, do another switchover after you make sure that all your latest data was replicated.
If your former source is now a big crater where a structure used to be, it will not come back. To protect your valuable data, you may want to establish a new replication to a NetApp Volumes volume in a different region. For that, you need to delete the old replication and create a new one with your production volume as the source. If your production volume is on ONTAP (but, oh no, the Google Cloud data center is now a big crater), use external replication to replicate to a different Google Cloud region. If your production volume is on NetApp Volumes (let’s hope nobody was injured when your data center transformed into a crater), use volume replication (CRR) to replicate to a different Google Cloud region.
[Edit 2025-08-28]: External replication is now available in allow-listed GA. Find out how this feature can enhance your DR strategy to help you stay competitive. To learn more, read the documentation. To test it, contact our Google Cloud specialists.