Improve data replication strategy and recovery point for large-scale databases on ONTAP 9.13.1

KManohar · ‎2023-06-26

What is a consistency group, and why do you need one?

As a simple example, imagine you run a grocery store, and you’re still using Excel. You’ve got a spreadsheet for tracking orders, another for deliveries, and another for inventory. Your PC crashes while these spreadsheets are open. Did you lose data on one of them? All of them?

You have a consistency problem. You’re running low on pasta. Do you already have pasta on order, or was that entry lost from that spreadsheet? If you did order it, did it already get received and sold, or is it still in transit? If the delivery spreadsheet has lost data, you don’t know. Are you low on pasta at all? Maybe the inventory spreadsheet is out of date.

Similar situations can occur if you’re a database administrator managing large datasets that span databases. In this situation, data from various applications might need to be protected with cross-dataset consistency.

This requirement also occurs within a single dataset. Databases consist of multiple components, such as transaction logs, primary and secondary data files, and other accessory files. These files tend to be distributed across multiple file systems or LUNs. Data protection often requires preservation of consistency.

Consistency is especially important if a disaster occurs. If you try to perform an emergency failover, and you realize that data across all volumes isn’t in a consistent state, the failover attempt could result in an error while mounting and opening the database.

You don’t want to see this during an emergency while trying to mount Microsoft SQL Server database at disaster recovery site:

In such a case, you would be forced to manually restore the database from available backups or storage snapshots (assuming they exist on the disaster recovery site!), which could be very time consuming.

That’s why data across multiple storage volumes needs to be managed in consistent order as a single unit. It’s called a “consistency group” in NetApp^® technology.

How can a consistency group solve the problem further?

NetApp ONTAP^® 9.13.1 data management software introduces consistency group support with our Asynchronous SnapMirror replication technology. With this support, data is replicated to secondary storage in a consistent state across all the volumes or LUNs. This replication means that a database can be mounted post-failover at a consistent point in time to bring the database online. No additional steps should be required; you just break the mirror and go.

Note: This approach gives you a very low recovery time objective (RTO), but basic SnapMirror isn’t a synchronous mirroring technology. The replicated datasets would be consistent, but might not be current. For zero-RPO data replication, check out the NetApp SnapMirror^® Business Continuity feature, which is also already available in ONTAP.

Here’s a basic example of using a consistency group to perform SnapMirror replication of a single Microsoft SQL Server database.

First, set up a consistency group for all volumes that host database files:

Then, protect it to secondary storage:

After replication is set up, check the status from ONTAP System Manager:

Application-consistent Snapshot copies

In Microsoft SQL Server 2022, a new feature was introduced: a database administrator can quiesce and unquiesce a database by using a T-SQL command. Between these commands, administrators can use the ONTAP REST API to trigger a volume Snapshot copy. Irrespective of database size in gigabytes or terabytes, a volume Snapshot copy can be executed in a few seconds and deliver an application-consistent database backup.

These NetApp Snapshot^™ copies are then replicated to destination storage in case data needs to be recovered using a Snapshot copy at the secondary site.

The following screenshot shows quiescing of a database with a T-SQL command. Multiple databases can also be quiesced in a single command. After the I/O is frozen, you can quickly trigger consistency group Snapshot copies by using a Windows PowerShell script that can call the ONTAP REST API. You then complete the operation by executing the backup command that will record the backup in the SQL Server system table and unfreeze the database.

Creating a clone to test disaster recovery or to spin up copies for development or reporting purposes is as quick and easy as a snap of the fingers. Everything can be easily managed from ONTAP System Manager or the REST API. Just select the consistency group at the source or destination where you want to create a clone, map it to the desired host, and mount the database. You then have a copy of even the largest databases, even if they’re spread across multiple LUNs on different volumes.

With that procedure, you can implement an easy and scalable disaster recovery strategy. If you just need to create backup on primary storage, the NetApp SnapCenter^® capability serves as a single pane of glass to manage data for backup, restore, and cloning without need for an additional script.

Conclusion

Data keeps growing in terms of scale and the number of datasets. Three mission-critical databases have turned into 30, or 300, or 3,000. Protecting this data is also getting more and more complicated, especially for disaster recovery protection.

NetApp understands these business use cases. In recent releases of ONTAP, we’ve introduced multiple innovative features such as granular RPO=0 data protection (SnapMirror Business Continuity), ransomware and malicious insider protection (tamperproof Snapshot copies), multi-admin verification (MFA integration with multiple products), and now SnapMirror consistency groups.

Check out more about consistency groups in our documentation, and learn about all the latest and greatest features introduced in ONTAP 9.13.1.