Tech ONTAP Blogs

Application-Consistent Snapshots for Oracle on GCNV- ONTAP mode with Consistency Groups

Dinesh-Gajendran
NetApp
14 Views

Why this matter?

 

For many Oracle environments, the biggest challenge is not creating backups—it is recovering fast enough when a failure occurs. Traditional RMAN backups are copy-based, meaning backup and restore times grow with database size. As databases scale into tens or hundreds of terabytes, backup windows expand, infrastructure costs rise, and recovery can take hours or even days. NetApp Snapshot technology takes a fundamentally different approach: instead of copying data, it preserves a point-in-time image of the storage in seconds, regardless of database size. This enables near-instant backups, space-efficient retention, higher backup frequency, and dramatically faster recovery to help meet aggressive RPO and RTO requirements.

 

Oracle databases typically span multiple volumes containing datafiles, redo logs, archive logs, and control files. Capturing these volumes independently can create write-order inconsistencies that jeopardize recovery. Google Cloud NetApp Volumes (GCNV) solves this with ONTAP Consistency Groups, which capture all database volumes at the exact same instant, preserving transactional consistency across the entire Oracle estate. When combined with Oracle backup coordination (hot-backup mode or Oracle snapshot-optimized recovery), the result is an application-consistent recovery point that can be restored in minutes rather than hours, while also enabling rapid cloning for dev/test, DR validation, and operational agility.

 

Choosing snapshot model: "rewind everything" vs "lose almost nothing"?

Corruption strikes at 14:00. Your snapshots are: 12:00 and 13:00 whole snapshots, plus a more frequent log snapshot at 13:55.

  • Traditional / whole restore: the newest complete image is 13:00. You restore DATA + REDO + control + FRA together → the database is back, but rewound to 13:00 = ~60 minutes of committed business lost.
  • Selective datafile restore: restore only the datafiles to the 13:00 baseline, keep the redo/control/archive (snapshotted at 13:55 and live), and roll forward → recovery point = 13:55 (and onward), ~5 minutes lost instead of 60.

Same storage, same snapshots — a 12× better RPO purely from what you choose to restore.

 

DineshGajendran_0-1783028030841.png

 

Model

What it snapshots

Restore options

Script / playbook

Whole-CG

One atomic CG snapshot of DATA + LOG together

50-restore-from-snapshot.sh [name|latest] [scope]

Scope=whole — CG revert to that exact point

30-app-consistent-snapshot.sh

Volume-split

Separate per-volume snapshots of DATA and LOG

Scope=datafiles (roll-forward)

Scope=split (point-in-time)

32-volume-split-snapshot.sh

 

 

A CG snapshot is all-or-nothing across every member volume, so it can’t bracket the datafile snapshot inside BEGIN/END BACKUP separately from the logs. The volume-split model uses native per-volume snapshots (which coexist with the CG) to do exactly that, following the classic NetApp/Oracle hot-backup sequence:

  1. ALTER DATABASE BEGIN BACKUP            (freeze datafile headers; DB stays open)
  2. snapshot the DATA volume(s)            (fuzzy datafiles)
  3. ALTER DATABASE END BACKUP              (always runs — shell trap / always block)
  4. ALTER SYSTEM ARCHIVE LOG CURRENT       (flush the backup-window redo to archive)
  5. BACKUP CONTROLFILE (binary + trace) onto the LOG volume
  6. snapshot the LOG volume(s)             (control + redo + archive + spfile)

Because END BACKUP precedes ARCHIVE LOG CURRENT, the LOG snapshot independently contains the end-backup marker and the archived backup-window redo — so the DATA snapshot can be recovered with just the LOG snapshot, or rolled forward against the live logs.

 

The workflow at a glance

DineshGajendran_1-1783028030843.png

 

Use the  sample scripts: https://github.com/NetApp/oracle-gcnv-consistent-snapshots/tree/main/scripts 

Execution guide: https://github.com/NetApp/oracle-gcnv-consistent-snapshots/blob/main/README.md

 

Meeting tight SLAs

SLA dimension

How this playbook delivers

RPO

A datafile-only restore (50-restore-from-snapshot.sh [name] datafiles) rolls forward through the current archive + online redo on the intact LOG volume to the last commit, so RPO under normal conditions is effectively zero — independent of snapshot frequency. Snapshot cadence only bounds RPO in the degraded case where the logs themselves are lost.

RTO

A CG revert is a metadata operation — the array does not copy data back. Recovery time is dominated by the Oracle stack bounce + media recovery of a small amount of redo, typically minutes.

Consistency

The Consistency Group guarantees write-order consistency across +DATA and +FRA in a single atomic snapshot — no torn, cross-volume images.

Auditability

Self-describing snapshot names (env/db/type/timestamp/change-id) plus a captured SCN per snapshot give a clean recovery and compliance trail.

Safety

Hot-backup mode is trap-protected (never left on); restore is pre-flight-gated and post-validated.

 

Summary

Running Google Cloud NetApp Volumes in ONTAP mode gives Oracle the full ONTAP snapshot engine plus Consistency Groups, all driven through the Google control plane with ADC — no LIFs, no stored credentials. Combine the CG with Oracle hot-backup mode and you get application-consistent, near-instant, space-efficient snapshots that restore to a clean, predictable database in minutes — exactly what tight RPO/RTO SLAs demand.

Public