Data Backup and Recovery

DB2, Snapcreator and DRP

pierrek
6,670 Views

Hi,

I have installed SC 3.5 at a customer yesterday to backup DB2 databases running on AIX (over FCP). The Customer has followed our best practices and created 2 separate volume groups per DB2 database, one VG for db2 datafiles and another VG for db2 active logs and archived logs. Disaster Recovery is an key element in the solution, so Volume SnapMirror has been used to replicate all db2 volumes to a remote datacenter:

(source db2 datafile volume on spso002)     -->     VSM     -->     (destination db2 datafile volume on spso003)

(source db2 log volume on spso002)           -->     VSM     -->     (destination db2 log volume on spso003)

SC 3.5 db2 module has been used to schedule consistent backups (by write suspend) of the databases (snapshots are created by SDU at the vg levels) , and trigger VSM updates after each hourly backup. Here is my config file as stored on the SC server (a SC agent is running on the AIX DB2 server):

CONFIG_TYPE=STANDARD

SNAME=hourly_db2_snap

SNAP_TIMESTAMP_ONLY=N

VOLUMES=spso002:vol_bemanp01_vgsap01,vol_bemanp01_vgsap01log

NTAP_SNAPSHOT_RETENTIONS=hourly:72

NTAP_USERS=spso003:root/53616c7465645f5f9d95c3815e32e476060ad0cbac5eb9b6548415fbf2b35b9b;spso002:root/53616c7465645f5fa95cfeb8b793cb263004841e394942c5b9e6d38bfc147cac

NTAP_PWD_PROTECTION=Y

TRANSPORT=HTTPS

PORT=443

LOG_NUM=30

LOG_TRACE_ENABLE=Y

SC_TMP_DIR=

NTAP_SNAPSHOT_RETENTION_AGE=3

SNAPDRIVE=Y

SNAPDRIVE_DISCOVERY=N

NTAP_SNAPSHOT_DISABLE=N

NTAP_SNAPSHOT_NODELETE=N

NTAP_CONSISTENCY_GROUP_SNAPSHOT=Y

NTAP_CONSISTENCY_GROUP_TIMEOUT=medium

NTAP_CONSISTENCY_GROUP_WAFL_SYNC=N

NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N

NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N

NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N

NTAP_SNAPMIRROR_UPDATE=Y

NTAP_SNAPMIRROR_CASCADING_UPDATE=N

SNAPMIRROR_VOLUMES=spso002:vol_bemanp01_vgsap01,vol_bemanp01_vgsap01log

SNAPMIRROR_CASCADING_VOLUMES=

NTAP_SNAPMIRROR_WAIT=60

NTAP_SNAPMIRROR_USE_SNAPSHOT=Y

NTAP_SNAPMIRROR_MAX_TRANSFER=

SC_AGENT=10.45.7.21:9090

SC_AGENT_TIMEOUT=900

SC_AGENT_UNQUIESCE_TIMEOUT=900

SC_CLONE_TARGET=

SC_AGENT_WATCHDOG_ENABLE=Y

SC_AGENT_LOG_ENABLE=Y

APP_NAME=db2

DB2_DATABASES=XD1:db2xd1

DB2_CMD=db2

NTAP_SNAPSHOT_CREATE_CMD01=/usr/bin/snapdrive snap create -vg vgsap01 vgsap01log -snapname %SNAME-%SNAP_TYPE_recent

######################################################################################################################

I have 2 questions about the above setup.

1) First, may I use the generic archive log management functionality built into SC to clean up the db2 archived logs older than my backup retention period, and may I receive an example of a db2 config file where this functionality is used? Is it tested with DB2, supported, ...?

##########################################################################################

# ARCHIVE_LOG_ENABLE - (Y|N) Setting which Enables Archive Log Management          #

#     (deletion of old archive logs)                             #

#                                             #

# ARCHIVE_LOG_RETENTION - Retention in Days for how long archive logs should be kept     #

#                                             #

# ARCHIVE_LOG_DIR - Path to where the archive logs are stored                 #

#                                             #

# ARCHIVE_LOG_EXT - File Extension for the archive logs, must be <something>.<extension> #

#     ie: 109209011.log in which case you would enter "log"                 #

#                                             #

# ARCHIVE_LOG_RECURSIVE_SEARCH - (Y|N) Setting which allows for managing archive logs     #

#                 within subdirs. If archive logs are located under     #

#                 subdirs this option should be used.             #

########################################################################################## 

2) My second question is about the db2 active and archived log snapmirror replication to the remote site. Say that I schedule one snapshot of the data and log volume groups every hour using SC.

This means that SC will trigger a VSM update to the remote site and replicate the consistent snapshots of my datafiles and logs to the remote site every hour. Now if I want to reduce my RPO in case of disaster to let's say 10 minutes, is it valid to run a crontab script every 10 minutes (from the DB2 AIX host) that will issue a couple of sync (to flush filesystem buffers to disks) then remotely (ssh to the snapmirror secondary filer) execute a snapmirror update of the db2 log volume?

In case of a DB2 disaster, I would basically do the following operations on the remote site:

1) snapmirror break + vol snap restore of the db2 datafile volume to the latest (1 hour old max) SC created (database consistent) snapshot

2) snapmirror break of the db2 log volume (10 minutes old max)

3) import vgs,...

4) start the db2 database at the remote site in a roll-forward pending state: "db2inidb [database] as mirror"

5) reapply the archived (and active?) logs and perform roll-forward recovery: db2 “rollforward database [database] to end of logs and complete”

Is this scenario valid? If I refer to Oracle, the online redo logs should not be part of a DR scenario (should not be replicated/reapplied), and the DRP oracle database should be opened with the "resetlogs" option, which will create new online redo logs. What about DB2 here? Will the db2 rollforward command try to apply transactions from replicated active logs as well? Is it recommended? Should the active logs be deleted before rollforward is started?

Thanks a lot in advance for any advice or comment.

Pierre

1 ACCEPTED SOLUTION

oommen
6,670 Views

Yep, I see that as a problem - as "write suspend" is a dead stop of I/O. But SC with CG (Consistency Group (does the fencing of volumes during snap operation), type medium) will ensure snapshot gets done within seven seconds or timeout and put the database into a normal state. At the end it depends upon how busy the database is. In the past we have asked IBM to do the snaps (with CG) without "write suspend" for the LOGS if the LOGS are stored on just one controller, IBM doesn't recommend it.

You might run into a Snap limit for LOGS volume, which will force to go for a lower retention just for the LOGS.

Regards,

Bobby

View solution in original post

11 REPLIES 11
Public