Tech ONTAP Blogs
Tech ONTAP Blogs
Volume replication is an easy-to-use, cross-region replication feature of Google Cloud NetApp Volumes. Although it leverages powerful NetApp® SnapMirror® technology, its operational model has subtle differences that make it more user friendly and less prone to administrative errors.
This article dives into the differences and discusses the implications for Terraform-based management.
If you have used SnapMirror in NetApp ONTAP® before, you know that it is a powerful, robust, and efficient replication technology. It’s used to solve all kinds of data replication problems, like building disaster recovery concepts, distributing data globally, or migrating data from one ONTAP system to another without having to worry about files, permissions, file locks, and more. Everyone who knows it, loves it.
But one aspect can be a bit annoying. SnapMirror takes care of everything within the volume, but it doesn’t manage the settings of the volume itself. Simple tasks like resizing the source volume or changing volume settings require an administrator to manually make the same changes on the destination volume. If the changes are not made thoroughly, the settings of source and destination volume diverge and can cause problems in operation or in the moment when you switch your workload over to the destination after your source was taken out by a disaster. Really, that’s the worst time to discover a configuration drift.
When building NetApp Volumes, we wondered how we could simplify an operator's life and reduce configuration drift. We came up with an approach that replicates the data of a volume, and also “replicates” the settings of a source volume to the destination. Here’s how it works.
Volumes that are in a volume replication are in a relationship. The relationship can be in one of two modes:
This simple but powerful approach eliminates configuration drift. We went even further: In ONTAP, you must create a destination volume manually before setting up a replication. In NetApp Volumes, we wrapped the creation of the destination volume into the replication setup process. All settings for the destination volume are inherited from the source. Just specify a destination storage pool, replication details, destination share and volume name, and NetApp Volumes takes care of all the other volume settings for you. This approach simplifies creating a replication considerably.
NetApp Volumes simplifies volume replication lifecycle management, but it is still a powerful and complex feature. When building the netapp_volume_replication resource for the google Terraform provider, we had to add some additional controls. In addition to the obvious input parameters like name, volume_name, location, replication_schedule, description, and labels, the resource includes a few other input parameters that are worth discussing,
This parameter controls the mode of the relationship.
If it is set to true, the desired state of the relationship is active. If the relationship is inactive, a RESUME operation is triggered. Note that a RESUME operation overwrites all changes made to the destination volume with source volume information. Be sure that this is your intention before enabling the replication.
If it is set to false, the desired state of the relationship is inactive. If the relationship is active, a STOP operation is triggered.
When set to true, the provider waits for ongoing transfers to finish before stopping a replication. This is desirable, but it can take a long time for large transfers.
When set to false, the provider does not wait for transfers to finish.
An active relationship can have one of two mirror_states. A mirror is either TRANSFERRING an update or it is waiting for the next scheduled transfer (mirror_state==MIRRORED) to start.
Ongoing transfers cannot be stopped except by using a force stop.
Set this parameter to true if you can’t wait for a long-running replication transfer to finish. The default is false.
Setting this parameter to true deletes the destination volume automatically if a replication relationship is deleted/destroyed. Stopping or resuming a mirror doesn’t delete the relationship. Take care: It’s great for testing but using it in production might lead to unintended loss of the destination volume.
This parameter block is used to specify the destination storage_pool, the name of the destination volume (volume_id), the share_name, and an optional description. This block is used only while creating the resource. It is ignored for all other operations. This fact has multiple implications:
For normal operation without time pressure, NetApp recommends letting ongoing transfers finish before stopping a replication. This is done by setting the parameters to:
force_stopping = false
wait_for_mirror = true
delete_destination_volume = false
With this setting, the provider waits for an ongoing transfer to finish before stopping the replication when doing replication_enabled = false.
When your priority is to get the destination volume as fast as possible to production, change the parameters to:
force_stopping = true
wait_for_mirror = false
delete_destination_volume = false
This setting stops the replication quickly and makes the destination volume read-write. Any ongoing transfer is aborted, and your destination has the content of the latest successful transfer.
A common question is how to handle the destination volume, which gets created automatically by the replication. Should you import it into Terraform to manage it?
The answer depends on whether the replication is active. In an active replication, any change done to one volume is done to both, which confuses Terraforms state tracking. It’s better not to put the destination volume under Terraform management while the replication is active.
When the replication is inactive, the destination volume becomes independent and you can manage it using Terraform by importing it. The drawback is that if you enable the replication again, you may need to drop the destination volume from your HCL code and the Terraform state manually.
Reverse and resume allows you to swap source and destination volume roles for an existing replication relationship and activates the replication. All data and settings of the former-source-but-now-new-destination volume are overwritten by the new source. Make sure that this is what you intend before triggering it.
The provider doesn’t support this operation. It needs to be triggered manually by using Cloud Console, gcloud, or the API. In addition, running this operation “confuses” the existing Terraform state. After running a reverse and resume, NetApp recommends manually dropping Terraform HCL code and state for the replication and the former source volume and reimporting the replication and the new source volume.
If you reverse and resume twice to establish the initial replication direction, you can leave the Terraform code and state untouched. State problems will resolve after the second reverse and resume.
Volume replication is a powerful feature that is easy to use. The google Terraform provider allows you to manage all NetApp Volumes resources, including volume replication. Day 1 operations like setting up a replication are very simple. Day 2 operations like changing the properties of the replication are also easy. Day X operations like stopping, resyncing, and reversing replications can cause data loss if not done carefully. Before applying your Terraform execution plans, make sure that they contain the results that you expect.