Data Backup and Recovery

SMSAP/SMO 3.3 on AIX: Clone Database does not work

helmut
7,308 Views

Hi, 

I installed SMSAP 3.3.0 in an AIX environment and discovered the following problem which looks very much like a bug to me.

When cloning a DB form an existing Snapshot-Backup and selecting the new 3.3 function "Recover Database" in the GUI Wizard (or when using CLI run the  smsap clone create WITHOUT the "-no-resetlogs" option then it fails with the messages:

--[ERROR] SMSAP-04083: Error Recovering cloned database

.....

--[ERROR] SMSAP-13032: Cannot perform operation: Clone Create.  Root cause: ORACLE-00001: Error executing SQL: [ALTER DATABASE OPEN RESETLOGS;]

we use FCP Luns and Volume Groups (AIX Standard)

if you have a closer look in the logs the you can see that soon after it mounted the archivlogs from the Backup to the temporary location, and when it prepares for the database recovery , suddenly a disconnect of the archivlogs  is issued - and after that its clear that the Recovery fails.

This part of the sequence looks e.g. like that in the log-file:

........

ORACLE-00000: Executing SQL command: SELECT STATUS FROM V$INSTANCE;

2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: ORACLE-20007: Database instance ABC is in state MOUNTED.

2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Opening connection for JDBC descriptor jdbc:oracle:thin:sys/XXXXXXXX@(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.11.133)(PORT = 1527)))(CONNECT_DATA = (SID = ABC)))

2013-03-06 10:09:10,122 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Adding logrequest null already tried from backup Snap1_logs

2013-03-06 10:09:10,123 [RMI TCP Connection(2)-10.10.11.133] [INFO ]: SMSAP-03055: Disconnecting backup Snap1_logs.

  --> suddenly starts disconnecting the archlog Filesystem!

013-03-06 10:09:10,520 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00016: Discovering storage resources for oraarchvg_0.

2013-03-06 10:09:10,559 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:

0:/bin/sh -c "/usr/sbin/snapdrive" "storage" "show" "-vg" "oraarchvg_0"

  ........

2013-03-06 10:09:13,798 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:

0:/bin/sh -c "/usr/sbin/snapdrive" "snap" "disconnect" "-vg" "oraarchvg_0" "-full"

1:/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full

2013-03-06 10:09:19,508 [Execution Monitor Thread [/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full]] [DEBUG]: EXE-0000 1: Shell result [0:00:05.710] (Exit Value: 0):

deleting disk group oraarchvg_0

  - fs /opt/NetApp/smsap/mnt/-oracle-KL1-oraarch-20130306100853455_0 ... deleted

  - hostvol oraarchvg_0/kl1oraarchlv_0 ... deleted

  - dg oraarchvg_0 ... deleted

  - LUN janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg/clone_prodclu_oraarchvg_01.lun ... disconnected

  - deleting volume clone ... janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg  done

2013-03-06 10:09:19,509 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00038: Finished disconnecting volume groups [oraarchvg_0].

2013-03-06 10:09:19,565 [RMI TCP Connection(2)-10.10.11.133] [ERROR]: SMSAP-04083: Error Recovering cloned database

I reproduced this on two AIX Systems with different SAP Instances.

It happend when cloning to an alternate host as well as when cloning to the same host

I downgraded to SMSAP 3.2P3 - and the problem disappeared - Clone to the same host or to another host worked without any error with SMSAP 3.2

I think it would be a good idea if NetApp would setup a test in their qualification lab to fix the issue.

It should be easily reproducable, at least with AIX and FCP Luns, but maybe even with other Unix OS.

Helmut

14 REPLIES 14
Public