Data Backup and Recovery

SMSAP/SMO 3.3 on AIX: Clone Database does not work

helmut
9,000 Views

Hi, 

I installed SMSAP 3.3.0 in an AIX environment and discovered the following problem which looks very much like a bug to me.

When cloning a DB form an existing Snapshot-Backup and selecting the new 3.3 function "Recover Database" in the GUI Wizard (or when using CLI run the  smsap clone create WITHOUT the "-no-resetlogs" option then it fails with the messages:

--[ERROR] SMSAP-04083: Error Recovering cloned database

.....

--[ERROR] SMSAP-13032: Cannot perform operation: Clone Create.  Root cause: ORACLE-00001: Error executing SQL: [ALTER DATABASE OPEN RESETLOGS;]

we use FCP Luns and Volume Groups (AIX Standard)

if you have a closer look in the logs the you can see that soon after it mounted the archivlogs from the Backup to the temporary location, and when it prepares for the database recovery , suddenly a disconnect of the archivlogs  is issued - and after that its clear that the Recovery fails.

This part of the sequence looks e.g. like that in the log-file:

........

ORACLE-00000: Executing SQL command: SELECT STATUS FROM V$INSTANCE;

2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: ORACLE-20007: Database instance ABC is in state MOUNTED.

2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Opening connection for JDBC descriptor jdbc:oracle:thin:sys/XXXXXXXX@(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.11.133)(PORT = 1527)))(CONNECT_DATA = (SID = ABC)))

2013-03-06 10:09:10,122 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Adding logrequest null already tried from backup Snap1_logs

2013-03-06 10:09:10,123 [RMI TCP Connection(2)-10.10.11.133] [INFO ]: SMSAP-03055: Disconnecting backup Snap1_logs.

  --> suddenly starts disconnecting the archlog Filesystem!

013-03-06 10:09:10,520 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00016: Discovering storage resources for oraarchvg_0.

2013-03-06 10:09:10,559 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:

0:/bin/sh -c "/usr/sbin/snapdrive" "storage" "show" "-vg" "oraarchvg_0"

  ........

2013-03-06 10:09:13,798 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:

0:/bin/sh -c "/usr/sbin/snapdrive" "snap" "disconnect" "-vg" "oraarchvg_0" "-full"

1:/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full

2013-03-06 10:09:19,508 [Execution Monitor Thread [/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full]] [DEBUG]: EXE-0000 1: Shell result [0:00:05.710] (Exit Value: 0):

deleting disk group oraarchvg_0

  - fs /opt/NetApp/smsap/mnt/-oracle-KL1-oraarch-20130306100853455_0 ... deleted

  - hostvol oraarchvg_0/kl1oraarchlv_0 ... deleted

  - dg oraarchvg_0 ... deleted

  - LUN janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg/clone_prodclu_oraarchvg_01.lun ... disconnected

  - deleting volume clone ... janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg  done

2013-03-06 10:09:19,509 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00038: Finished disconnecting volume groups [oraarchvg_0].

2013-03-06 10:09:19,565 [RMI TCP Connection(2)-10.10.11.133] [ERROR]: SMSAP-04083: Error Recovering cloned database

I reproduced this on two AIX Systems with different SAP Instances.

It happend when cloning to an alternate host as well as when cloning to the same host

I downgraded to SMSAP 3.2P3 - and the problem disappeared - Clone to the same host or to another host worked without any error with SMSAP 3.2

I think it would be a good idea if NetApp would setup a test in their qualification lab to fix the issue.

It should be easily reproducable, at least with AIX and FCP Luns, but maybe even with other Unix OS.

Helmut

14 REPLIES 14

ashwath
8,898 Views

Hi,

      I am currently looking into the issue you reported and will revert soon with an update.

Thanks,

Ashwath

ashwath
8,898 Views

Hi Helmut,

      Can you please provide me the complete logs of Oracle and SMO?

Thanks,

Ashwath

helmut
8,898 Views

Hi Ashwath,

I can provide you the log file from /var/log/SMSAP where should I send it to or upload?

Helmut

ashwath
8,898 Views

Hi Helmut,

       Please zip the logs and upload them here itself. ("Use advanced editor" to attach the files)

Thanks,

Ashwath

helmut
8,898 Views

Ah - thanks! - here it is

ashwath
8,898 Views

Thanks Helmut. Could you please upload the alert.log for Oracle DB as well?

Thanks,

Ashwath

helmut
8,898 Views

Unfortunatly I did not bring it with me from the customer - we had a look at it but there was nothing that we can find inside it - but I will ask the customer to send me the alter log as well

Helmut

ashwath
8,898 Views

Hi Helmut,

        I will get started to look into the issues with the current logs you have sent. However, it will be helpful if you can send the alert.log as well.

Thanks,

Ashwath

helmut
8,016 Views

Hi,

Sorry for the late response. I had the customer to send me the alert log but he has no longer logs before the 7.March - so not the one with our events, unfortunately.

It would be interesting if you can reproduce the error in your labs - I guess it should be possible - maybe even with other OS then ours.

regards,

Helmut

ashwath
8,016 Views

After running through the logs, it appears that the listener does not identify the cloned database. Could you please check the listener status and revert?

Thanks,

Ashwath

helmut
8,016 Views

Sorry for the late answer:

No - on our customer site we can not check anything. The whole environment does no longer exist (since we downgraded the whole environment to SMSAP 3.2 when we discovered the problem - The downgrade solved the issue immediatly)

Do you have the possibility to create a Test-environment with the same Versions, and test it internally?

Have you ever seen a successful clone with Recovery in Version 3.3 ?

SHEETZDBA
8,898 Views

I think I have a similar issue on RedHat 5 using SMO 3.3. Cloning used to work in 3.2, but now it doesn't function. The archivelog ASM disk is removed before the operation is complete.

--[ INFO] SMO-03054: Mounting backup Backup1 on host dcvmtestdb.xxx.com to feed archivelogs.

--[ INFO] SD-00028: Beginning to connect device(s) [/dev/sdf] from snapshot smo_dcvmtestdb_test2db_f_h_2_8aa08ae33d7ec35b013d7ec362300001_1.

--[ INFO] SD-00029: Finished connecting device(s) [/dev/sdf] from snapshot smo_dcvmtestdb_test2db_f_h_2_8aa08ae33d7ec35b013d7ec362300001_1.

--[ INFO] ASM-10002: Assigning ASMLib disk identifier SM_DSK_1363631733567 to device /dev/sdm1.

--[ INFO] ASM-00500: Changing ASM Disk Group name in header of disk /dev/sdm1 from REC to REC_20130318143533654_0.

--[ INFO] ASM-00000: Mounting ASM Disk Group REC_20130318143533654_0 on host dcvmtestdb.xxx.com.

--[ INFO] SMO-03061: Perform recovery using backup Backup1

--[ INFO] SMO-03081: Perform recovery using all connected backups Backup1

--[ INFO] ORACLE-30023: Beginning batch recovery process for database Clone1.

--[ INFO] ORACLE-30008: Beginning recovery process for database Clone1.

--[ INFO] ORACLE-30010: Database recovery point objective: earliest consistent point

--[ INFO] ORACLE-30017: Recovery point objective of earliest consistent point reached after applying archived/online redo logs [+REC_20130318143533654_0/test2db/archivelog/2013_03_18/thread_1_seq_1.256.810397791].

--[ INFO] ORACLE-30024: Finished batch recovery process for database Clone1.

--[ INFO] SMO-03066: Recovery succeeded.

-->REMOVES THE ARCHIVELOG ASM DISK

--[ INFO] SMO-03055: Disconnecting backup Backup1.

--[ INFO] ASM-00002: Dropping ASM Disk Group REC_20130318143533654_0.

--[ INFO] ASM-10003: Deleting ASMLib disk SM_DSK_1363631733567.

--[ INFO] SD-00019: Discovering storage resources for all system devices.

--[ INFO] SD-00020: Finished storage discovery for all system devices.

--[ INFO] SD-00034: Beginning to disconnect device(s) [/dev/sdm].

--[ INFO] SD-00035: Finished disconnecting device(s) [/dev/sdm].

--[ INFO] SMO-07320: Opening database "Clone1" with RESETLOGS option.

--[ INFO] SMO-07316: Shutting down database "Clone1".

--[ INFO] SMO-07321: Starting up database "Clone1" in MOUNT mode.

--[ INFO] SMO-07307: Generating new Database ID for "Clone1".

--[ERROR] SMO-07315: Error running DBNEWID utility: .

--[ERROR] SMO-13032: Cannot perform operation: Clone Create.  Root cause: SMO-07315: Error running DBNEWID utility: .

-->ERROR DUE TO QUERYING ASM DISK IT ALREADY REMOVED

--[ERROR] FLOW-11019: Failure in Disconnect: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated

--[ERROR] FLOW-11008: Operation failed: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated

--[ERROR] SMO-08002: Error while aborting operation: SMO-11008: Error destroying cloned storage: FLOW-11019: Failure in Disconnect: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated

ashwath
8,016 Views

Can you please upload the SMO log file?

Thanks,

Ashwath

ANDREAS_JANKOWIAK
8,016 Views

Hello,

I also have the problem that after upgrading the SMSAP to Version 3.3 the cloning process tries to connect to the Oracle Database via port 1500.  The listener port in our system is configured on port 1502. The whole process was working perfectly with version 3.1.

Is this a bug in version 3.3 ? Is there a solution from NetApp?

Thanks you!

Best Regards,

Andreas

Public