Data Backup and Recovery
Data Backup and Recovery
I installed SMSAP 3.3.0 in an AIX environment and discovered the following problem which looks very much like a bug to me.
When cloning a DB form an existing Snapshot-Backup and selecting the new 3.3 function "Recover Database" in the GUI Wizard (or when using CLI run the smsap clone create WITHOUT the "-no-resetlogs" option then it fails with the messages:
--[ERROR] SMSAP-04083: Error Recovering cloned database
--[ERROR] SMSAP-13032: Cannot perform operation: Clone Create. Root cause: ORACLE-00001: Error executing SQL: [ALTER DATABASE OPEN RESETLOGS;]
we use FCP Luns and Volume Groups (AIX Standard)
if you have a closer look in the logs the you can see that soon after it mounted the archivlogs from the Backup to the temporary location, and when it prepares for the database recovery , suddenly a disconnect of the archivlogs is issued - and after that its clear that the Recovery fails.
This part of the sequence looks e.g. like that in the log-file:
ORACLE-00000: Executing SQL command: SELECT STATUS FROM V$INSTANCE;
2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: ORACLE-20007: Database instance ABC is in state MOUNTED.
2013-03-06 10:09:10,116 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Opening connection for JDBC descriptor jdbc:oracle:thin:sys/XXXXXXXX@(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.11.133)(PORT = 1527)))(CONNECT_DATA = (SID = ABC)))
2013-03-06 10:09:10,122 [RMI TCP Connection(2)-10.10.11.133] [DEBUG]: Adding logrequest null already tried from backup Snap1_logs
2013-03-06 10:09:10,123 [RMI TCP Connection(2)-10.10.11.133] [INFO ]: SMSAP-03055: Disconnecting backup Snap1_logs.
--> suddenly starts disconnecting the archlog Filesystem!
013-03-06 10:09:10,520 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00016: Discovering storage resources for oraarchvg_0.
2013-03-06 10:09:10,559 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:
0:/bin/sh -c "/usr/sbin/snapdrive" "storage" "show" "-vg" "oraarchvg_0"
2013-03-06 10:09:13,798 [default0 aed43035b10a261cacf8a6b4f6185072] [DEBUG]: EXE-00000: Executing shell command:
0:/bin/sh -c "/usr/sbin/snapdrive" "snap" "disconnect" "-vg" "oraarchvg_0" "-full"
1:/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full
2013-03-06 10:09:19,508 [Execution Monitor Thread [/usr/sbin/snapdrive snap disconnect -vg oraarchvg_0 -full]] [DEBUG]: EXE-0000 1: Shell result [0:00:05.710] (Exit Value: 0):
deleting disk group oraarchvg_0
- fs /opt/NetApp/smsap/mnt/-oracle-KL1-oraarch-20130306100853455_0 ... deleted
- hostvol oraarchvg_0/kl1oraarchlv_0 ... deleted
- dg oraarchvg_0 ... deleted
- LUN janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg/clone_prodclu_oraarchvg_01.lun ... disconnected
- deleting volume clone ... janis:/vol/SnapManager_20130306100853461_clone_prodclu_oraarchvg done
2013-03-06 10:09:19,509 [default0 aed43035b10a261cacf8a6b4f6185072] [INFO ]: SD-00038: Finished disconnecting volume groups [oraarchvg_0].
2013-03-06 10:09:19,565 [RMI TCP Connection(2)-10.10.11.133] [ERROR]: SMSAP-04083: Error Recovering cloned database
I reproduced this on two AIX Systems with different SAP Instances.
It happend when cloning to an alternate host as well as when cloning to the same host
I downgraded to SMSAP 3.2P3 - and the problem disappeared - Clone to the same host or to another host worked without any error with SMSAP 3.2
I think it would be a good idea if NetApp would setup a test in their qualification lab to fix the issue.
It should be easily reproducable, at least with AIX and FCP Luns, but maybe even with other Unix OS.
I am currently looking into the issue you reported and will revert soon with an update.
Can you please provide me the complete logs of Oracle and SMO?
I can provide you the log file from /var/log/SMSAP where should I send it to or upload?
Please zip the logs and upload them here itself. ("Use advanced editor" to attach the files)
Thanks Helmut. Could you please upload the alert.log for Oracle DB as well?
Unfortunatly I did not bring it with me from the customer - we had a look at it but there was nothing that we can find inside it - but I will ask the customer to send me the alter log as well
I will get started to look into the issues with the current logs you have sent. However, it will be helpful if you can send the alert.log as well.
Sorry for the late response. I had the customer to send me the alert log but he has no longer logs before the 7.March - so not the one with our events, unfortunately.
It would be interesting if you can reproduce the error in your labs - I guess it should be possible - maybe even with other OS then ours.
After running through the logs, it appears that the listener does not identify the cloned database. Could you please check the listener status and revert?
Sorry for the late answer:
No - on our customer site we can not check anything. The whole environment does no longer exist (since we downgraded the whole environment to SMSAP 3.2 when we discovered the problem - The downgrade solved the issue immediatly)
Do you have the possibility to create a Test-environment with the same Versions, and test it internally?
Have you ever seen a successful clone with Recovery in Version 3.3 ?
I think I have a similar issue on RedHat 5 using SMO 3.3. Cloning used to work in 3.2, but now it doesn't function. The archivelog ASM disk is removed before the operation is complete.
--[ INFO] SMO-03054: Mounting backup Backup1 on host dcvmtestdb.xxx.com to feed archivelogs.
--[ INFO] SD-00028: Beginning to connect device(s) [/dev/sdf] from snapshot smo_dcvmtestdb_test2db_f_h_2_8aa08ae33d7ec35b013d7ec362300001_1.
--[ INFO] SD-00029: Finished connecting device(s) [/dev/sdf] from snapshot smo_dcvmtestdb_test2db_f_h_2_8aa08ae33d7ec35b013d7ec362300001_1.
--[ INFO] ASM-10002: Assigning ASMLib disk identifier SM_DSK_1363631733567 to device /dev/sdm1.
--[ INFO] ASM-00500: Changing ASM Disk Group name in header of disk /dev/sdm1 from REC to REC_20130318143533654_0.
--[ INFO] ASM-00000: Mounting ASM Disk Group REC_20130318143533654_0 on host dcvmtestdb.xxx.com.
--[ INFO] SMO-03061: Perform recovery using backup Backup1
--[ INFO] SMO-03081: Perform recovery using all connected backups Backup1
--[ INFO] ORACLE-30023: Beginning batch recovery process for database Clone1.
--[ INFO] ORACLE-30008: Beginning recovery process for database Clone1.
--[ INFO] ORACLE-30010: Database recovery point objective: earliest consistent point
--[ INFO] ORACLE-30017: Recovery point objective of earliest consistent point reached after applying archived/online redo logs [+REC_20130318143533654_0/test2db/archivelog/2013_03_18/thread_1_seq_1.256.810397791].
--[ INFO] ORACLE-30024: Finished batch recovery process for database Clone1.
--[ INFO] SMO-03066: Recovery succeeded.
-->REMOVES THE ARCHIVELOG ASM DISK
--[ INFO] SMO-03055: Disconnecting backup Backup1.
--[ INFO] ASM-00002: Dropping ASM Disk Group REC_20130318143533654_0.
--[ INFO] ASM-10003: Deleting ASMLib disk SM_DSK_1363631733567.
--[ INFO] SD-00019: Discovering storage resources for all system devices.
--[ INFO] SD-00020: Finished storage discovery for all system devices.
--[ INFO] SD-00034: Beginning to disconnect device(s) [/dev/sdm].
--[ INFO] SD-00035: Finished disconnecting device(s) [/dev/sdm].
--[ INFO] SMO-07320: Opening database "Clone1" with RESETLOGS option.
--[ INFO] SMO-07316: Shutting down database "Clone1".
--[ INFO] SMO-07321: Starting up database "Clone1" in MOUNT mode.
--[ INFO] SMO-07307: Generating new Database ID for "Clone1".
--[ERROR] SMO-07315: Error running DBNEWID utility: .
--[ERROR] SMO-13032: Cannot perform operation: Clone Create. Root cause: SMO-07315: Error running DBNEWID utility: .
-->ERROR DUE TO QUERYING ASM DISK IT ALREADY REMOVED
--[ERROR] FLOW-11019: Failure in Disconnect: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated
--[ERROR] FLOW-11008: Operation failed: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated
--[ERROR] SMO-08002: Error while aborting operation: SMO-11008: Error destroying cloned storage: FLOW-11019: Failure in Disconnect: ASM-10000: Error executing ASMLib command "/etc/init.d/oracleasm querydisk -d SM_DSK_1363631733567": Disk "SM_DSK_1363631733567" does not exist or is not instantiated
Can you please upload the SMO log file?
I also have the problem that after upgrading the SMSAP to Version 3.3 the cloning process tries to connect to the Oracle Database via port 1500. The listener port in our system is configured on port 1502. The whole process was working perfectly with version 3.1.
Is this a bug in version 3.3 ? Is there a solution from NetApp?