ONTAP Discussions

Difficulties during a restore

gasparuben
8,606 Views

Hi there,

 

I have configured this tool and it works for taking backups and deleting them. Nevertheless I have difficulties for a restore.

Here is what I have done for a full restore.

 

My db layout is like:


sys@BD2:SQL> select name from v$datafile;

NAME
------------------------------------------------------------------------------------------------------------------------------------------------------
/ORA/dbs03/BD2/datafile/o1_mf_system__1358191725125556_.dbf
/ORA/dbs03/BD2/datafile/o1_mf_sysaux__1348933194894589_.dbf
/ORA/dbs03/BD2/datafile/o1_mf_undo01__1348933199657938_.dbf
/ORA/dbs03/BD2/datafile/o1_mf_dbod__1348933228205065_.dbf
/ORA/dbs03/BD2/datafile/o1_mf_xdb__1348940086272189_.dbf
/ORA/dbs03/BD2/datafile/o1_mf_tpcctab__1357049580539768_.dbf

 

 

These are my three NFS volumes (ontap 8.1P1, 7-mode):

dbnasg404:/vol/bdisktest200 on /ORA/dbs00/BD2 type nfs (rw,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,addr=172.30.1.4)

dbnasg404:/vol/bdisktest202 on /ORA/dbs02/BD2 type nfs (rw,bg,hard,nointr,tcp,nfsvers=3,actimeo=0,timeo=600,rsize=32768,wsize=32768,addr=172.30.1.4)

dbnasg404:/vol/bdisktest203 on /ORA/dbs03/BD2 type nfs (rw,bg,hard,nointr,tcp,nfsvers=3,actimeo=0,timeo=600,rsize=32768,wsize=32768,addr=172.30.1.4)

 

/ORA/dbs03/DB2 contains just datafiles. Controlfile, block tracking file, redo logs are on different volumes.

 

Initially:

dbnasg404> snap list bdisktest203
Volume bdisktest203
working...

No snapshots exist.

 

Then I take a full with tag test_full02:

 

RMAN>

backup proxy only incremental level 0 tag 'test_full02' check logical filesperset 32
    database force format '%d_%T_%U_lvl0A';

Starting backup at 14-JAN-2013 20:05:30
configuration for SBT_TAPE channel 2 is ignored
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=129 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: WARNING: Oracle Test Disk API
channel ORA_SBT_TAPE_1: starting incremental level 0 proxy datafile backup at 14-JAN-2013 20:05:31
channel ORA_SBT_TAPE_1: specifying datafile(s) for proxy backup
input datafile file number=00006 name=/ORA/dbs03/BD2/datafile/o1_mf_tpcctab__1357049580539768_.dbf
proxy file handle=BD2_20130114_2onvdv0b_1_1_lvl0A
input datafile file number=00003 name=/ORA/dbs03/BD2/datafile/o1_mf_undo01__1348933199657938_.dbf
proxy file handle=BD2_20130114_2onvdv0b_2_1_lvl0A
input datafile file number=00002 name=/ORA/dbs03/BD2/datafile/o1_mf_sysaux__1348933194894589_.dbf
proxy file handle=BD2_20130114_2onvdv0b_3_1_lvl0A
input datafile file number=00001 name=/ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf
proxy file handle=BD2_20130114_2onvdv0b_4_1_lvl0A
input datafile file number=00004 name=/ORA/dbs03/BD2/datafile/o1_mf_dbod__1348933228205065_.dbf
proxy file handle=BD2_20130114_2onvdv0b_5_1_lvl0A
input datafile file number=00005 name=/ORA/dbs03/BD2/datafile/o1_mf_xdb__1348940086272189_.dbf
proxy file handle=BD2_20130114_2onvdv0b_6_1_lvl0A
channel ORA_SBT_TAPE_1: proxy copy complete, elapsed time: 00:00:03
Finished backup at 14-JAN-2013 20:05:34

Starting Control File and SPFILE Autobackup at 14-JAN-2013 20:05:34
piece handle=c-963935198-20130114-08 comment=API Version 2.0,MMS Version 8.1.3.0
Finished Control File and SPFILE Autobackup at 14-JAN-2013 20:05:38

 

Check the snapshot:
dbnasg404> snap list bdisktest203
Volume bdisktest203
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
  0% ( 0%)    0% ( 0%)  Jan 14 20:04  BD2_20130114_2onvdv0b_6_1_lvl0A

 

Second full (I would like to see this one desappear  while restoring to test_full02):


RMAN> backup proxy only incremental level 0 tag 'test_full03' check logical filesperset 32
    database force format '%d_%T_%U_lvl0A';2>

Starting backup at 14-JAN-2013 20:07:32
using target database control file instead of recovery catalog
configuration for SBT_TAPE channel 2 is ignored
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=210 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: WARNING: Oracle Test Disk API
channel ORA_SBT_TAPE_1: starting incremental level 0 proxy datafile backup at 14-JAN-2013 20:07:33
channel ORA_SBT_TAPE_1: specifying datafile(s) for proxy backup
input datafile file number=00006 name=/ORA/dbs03/BD2/datafile/o1_mf_tpcctab__1357049580539768_.dbf
proxy file handle=BD2_20130114_2qnvdv45_1_1_lvl0A
input datafile file number=00003 name=/ORA/dbs03/BD2/datafile/o1_mf_undo01__1348933199657938_.dbf
proxy file handle=BD2_20130114_2qnvdv45_2_1_lvl0A
input datafile file number=00002 name=/ORA/dbs03/BD2/datafile/o1_mf_sysaux__1348933194894589_.dbf
proxy file handle=BD2_20130114_2qnvdv45_3_1_lvl0A
input datafile file number=00001 name=/ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf
proxy file handle=BD2_20130114_2qnvdv45_4_1_lvl0A
input datafile file number=00004 name=/ORA/dbs03/BD2/datafile/o1_mf_dbod__1348933228205065_.dbf
proxy file handle=BD2_20130114_2qnvdv45_5_1_lvl0A
input datafile file number=00005 name=/ORA/dbs03/BD2/datafile/o1_mf_xdb__1348940086272189_.dbf
proxy file handle=BD2_20130114_2qnvdv45_6_1_lvl0A
channel ORA_SBT_TAPE_1: proxy copy complete, elapsed time: 00:00:03
Finished backup at 14-JAN-2013 20:07:36

Starting Control File and SPFILE Autobackup at 14-JAN-2013 20:07:37
piece handle=c-963935198-20130114-09 comment=API Version 2.0,MMS Version 8.1.3.0
Finished Control File and SPFILE Autobackup at 14-JAN-2013 20:07:40

 

Check snap:


dbnasg404> snap list bdisktest203
Volume bdisktest203
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
  0% ( 0%)    0% ( 0%)  Jan 14 20:06  BD2_20130114_2qnvdv45_6_1_lvl0A
  0% ( 0%)    0% ( 0%)  Jan 14 20:04  BD2_20130114_2onvdv0b_6_1_lvl0A

 

Corrupt system file :

 

dd if=/dev/zero of=/ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf bs=1024 count=8

ERROR:
ORA-01578: ORACLE data block corrupted (file # 1, block # 1072)
ORA-01110: data file 1: '/ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf'


sys@BD2:SQL> shutdown immediate;
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf'
ORA-01210: data file header is media corrupt

 

Perform a restore:

 

RMAN> run {
2> allocate channel EFGH device type sbt PARMS='SBT_LIBRARY=/ORA/dbs01/oracle/product/rdbms/lib/libobk.so ENV=(BACKUP_DIR=/ORA/dbs01/oracle/home/netapp_mml_config,LD_LIBRARY_PATH=/ORA/dbs01/oracle/product/rdbms/lib,CONF=netap_bd2.conf,RESTORETYPE=volume)' debug 1 trace 99;
restore database from tag 'test_full02';
}3> 4>

RMAN-06009: using target database control file instead of recovery catalog
RMAN-08030: allocated channel: EFGH
RMAN-08500: channel EFGH: SID=126 device type=SBT_TAPE
RMAN-08526: channel EFGH: WARNING: Oracle Test Disk API

RMAN-03090: Starting restore at 14-JAN-2013 20:28:42

RMAN-08090: channel EFGH: starting proxy restore
RMAN-08094: channel EFGH: specifying datafile(s) for proxy restore
RMAN-08610: channel EFGH: restoring datafile 00001 to /ORA/dbs03/BD2/datafile/o1_mf_system__1348933191464435_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_4_1_lvl0A
RMAN-08610: channel EFGH: restoring datafile 00002 to /ORA/dbs03/BD2/datafile/o1_mf_sysaux__1348933194894589_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_3_1_lvl0A
RMAN-08610: channel EFGH: restoring datafile 00003 to /ORA/dbs03/BD2/datafile/o1_mf_undo01__1348933199657938_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_2_1_lvl0A
RMAN-08610: channel EFGH: restoring datafile 00004 to /ORA/dbs03/BD2/datafile/o1_mf_dbod__1348933228205065_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_5_1_lvl0A
RMAN-08610: channel EFGH: restoring datafile 00005 to /ORA/dbs03/BD2/datafile/o1_mf_xdb__1348940086272189_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_6_1_lvl0A
RMAN-08610: channel EFGH: restoring datafile 00006 to /ORA/dbs03/BD2/datafile/o1_mf_tpcctab__1357049580539768_.dbf
RMAN-08529: proxy file handle=BD2_20130114_2onvdv0b_1_1_lvl0A
RMAN-08031: released channel: EFGH
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/14/2013 20:28:46
RMAN-03009: failure of IRESTORE command on EFGH channel at 01/14/2013 20:28:46
ORA-19563: datafile copy header validation failed for file /ORA/dbs03/BD2/datafile/o1_mf_system__1358191725125556_.dbf
ORA-01251: Unknown File Header Version read for file number 0

 

This doesnt work, the file has not been restored. Snapshots are still there.

 

My config file looks like:

dbsrvg406>-RDBMS>-BD2:~/netapp_mml_config$ cat netap_bd2.conf

FILER=172.30.1.4:root/XXXXXXXX

FILERPASS_ENCRYPTED=YES

VOLUMES=172.30.1.4:bdisktest203

PROTOCOL=nfs

DB_LUN=

DB_MOUNTPOINT=10.61.173.169:bdisktest203:/ORA/dbs03/BD2,/ORA/dbs03/BD2

 

Please let me know,

Thanks for your time,
Ruben

8 REPLIES 8

nkarthik
8,606 Views

change "10.61.173.169"  to "172.30.1.4" try again.

gasparuben
8,606 Views

Indeed ...

I will check later. Thanks for having a look!

gasparuben
8,606 Views

So indeed it works after I have correct the misconfiguration. Thank you again.

Nevertheless having a look to the trace file, it looks like RMAN is ok with the fact that the restore was not properly executed (IP pointing to nowhere in DB_MOUNTPOINT) is the recover command that throws the above error.  On the trace file I can see:

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of restore command at 01/14/2013 20:31:01

RMAN-03009: failure of IRESTORE command on EFGH channel at 01/14/2013 20:31:01

RMAN-10032: unhandled exception during execution of job step 1:

ORA-06512: at line 92

RMAN-10035: exception raised in RPC:

ORA-19563: datafile copy header validation failed for file /ORA/dbs03/BD2/datafile/o1_mf_system__1358191725125556_.dbf

ORA-01251: Unknown File Header Version read for file number 0

ORA-06512: at "SYS.X$DBMS_BACKUP_RESTORE", line 4504

RMAN-10031: RPC Error: ORA-19563  occurred during call to DBMS_BACKUP_RESTORE.PROXYGO

DBGMISC:          ENTERED krmkursr [20:31:07.105]

This is in my opinion no properly handled. The error should be obvius in this case and be dispatched during restore.

One other comment once I have done the restore but deleted snapshots are not properly removed. So in my case I had:

dbnasg404> snap list bdisktest203
Volume bdisktest203
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
13% (13%)    0% ( 0%)  Jan 19 23:17  BD2_20130119_0lnvrg5a_6_1_lvl0A
16% ( 3%)    0% ( 0%)  Jan 19 23:15  BD2_20130119_0dnvrg25_6_1_lvl0A
25% (13%)    0% ( 0%)  Jan 19 17:23  BD2_20130119_panvqrer_6_1_lvl0A

I restored to  BD2_20130119_panvqrer_6_1_lvl0A, so I was expecting the other two to desappear what indeed happened, nevertheless they still were marked as available on RMAN, so after a crosscheck everything including BD2_20130119_panvqrer_6_1_lvl0A was expired, which makes no sense. A validate also failed:

run {

allocate channel EFGH device type sbt PARMS='SBT_LIBRARY=/ORA/dbs01/oracle/product/rdbms/lib/libobk.so ENV=(BACKUP_DIR=/ORA/dbs01/oracle/home/netapp_mml_config,LD_LIBRARY_PATH=/ORA/dbs01/oracle/product/rdbms/lib,CONF=netap_bd2.conf)' debug 1 trace 99;

allocate channel c1 device type  'SBT_TAPE' PARMS  'SBT_LIBRARY=/opt/tivoli/tsm/client/oracle/bin64/libobk.so,ENV=(TDPO_OPTFILE=/opt/tivoli/tsm/client/oracle/bin64/tdpo.opt)';

crosscheck backup of database;

}

RMAN> run {
allocate channel EFGH device type sbt PARMS='SBT_LIBRARY=/ORA/dbs01/oracle/product/rdbms/lib/libobk.so ENV=(BACKUP_DIR=/ORA/dbs01/oracle/home/netapp_mml_config,LD_LIBRARY_PATH=/ORA/dbs01/oracle/product/rdbms/lib,CONF=netap_bd2.conf)' debug 1 trace 99;
2> 3> restore database from tag 'TEST_FULL01' validate;
4> }

released channel: ORA_SBT_TAPE_1
released channel: ORA_DISK_1
allocated channel: EFGH
channel EFGH: SID=208 device type=SBT_TAPE
channel EFGH: WARNING: Oracle Test Disk API

Starting restore at 20-JAN-2013 14:48:30

released channel: EFGH
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/20/2013 14:48:34
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 6 found to restore
RMAN-06023: no backup or copy of datafile 5 found to restore
RMAN-06023: no backup or copy of datafile 4 found to restore
RMAN-06023: no backup or copy of datafile 3 found to restore
RMAN-06023: no backup or copy of datafile 2 found to restore
RMAN-06023: no backup or copy of datafile 1 found to restore

But it's there:

dbnasg404>snap list bdisktest203
Volume bdisktest203
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
14% (14%)    0% ( 0%)  Jan 19 17:23  BD2_20130119_panvqrer_6_1_lvl0A

Should the crosscheck command work with MML proxy? even if there is no MML channel alocated it should not expired the proxy copies in my opinion. A validate looks not capable to change state of a proxy copy.

gasparuben
8,606 Views

Just a brief comment on the second part of my previous reply.

The RMAN crosscheck command looks like expires all proxy copies if no proxy channel is configured. This, in my opinion, shouldnt be the case.

In a context of TAPE and DISK backupsets I wouldnt expect to expire DISK backupsets if crosscheck was dispatched with just TAPE channels.

On the use case I mentioned above, it may be due to the fact that disk backupsets are not distinguished from proxy ones. Both are treated as disk type. Though this shouldnt be in my opinion.

I have also realised that the delete obsolete command does not do a full clean-up as expected. I dont want to be too verbose on this posting but I realise that despite that the retention policy is honnored the deletion only removes just a few snapshots not all of them. Proxy records are fully removed from RMAN though, leaving some orphaned snapshots. I can send you my use case if requested.

Thanks for your time.

nkarthik
8,606 Views

Hello,

The "delete obsolete" is included in v2.0, which will be uploaded in a couple of weeks. it's in QA phase. I hope you can check the "cross check" with "delete obsolete" in v2.0.

Did you have a chance to test database backup, restore as well as datafile and tablespace backup and restore. Please check the https://communities.netapp.com/docs/DOC-21582 for presentation and demo. Please let me know, if you need any more help also please check the sbtio.log. Because all the error related to MML error reported in sbtio.log.

Regards,

Karthikeyan.N

gasparuben
8,606 Views

Hi Karthikeyan,

In fact the error message is more useful on sbtio.log file:

SBT-3330 01/14/13 20:31:00 print_DB_MNTPT: IP : 10.61.173.169
SBT-3330 01/14/13 20:31:00 print_DB_MNTPT: volume : bdisktest203
SBT-3330 01/14/13 20:31:00 counter before return:0
SBT-3330 01/14/13 20:31:00 EXIT: umount_mount_fn
SBT-3330 01/14/13 20:31:00 ip_vol_chk_counter: 0 [1 - pass/0 - fail]
SBT-3330 01/14/13 20:31:00 StartRestoreVolume_SAN: volname and ip not matches with DB_MOUNTPOINT or CONTROL_MOUNTPOINT
SBT-3330 01/14/13 20:31:00 StartRestoreVolume_SAN: umount not possible before snaprestore(SAN)
SBT-3330 01/14/13 20:31:00 EXIT: StartRestoreVolume_SAN
SBT-3330 01/14/13 20:31:00 InitiateRestore: ret_counter after StartRestoreVolume_SAN : 0
SBT-3330 01/14/13 20:31:00 InitiateRestore: Volume Restore DONE for bdisktest203 from BD2_20130114_2onvdv0b_6_1_lvl0A
SBT-3330 01/14/13 20:31:01 EXIT: InitiateRestore
SBT-3330 01/14/13 20:31:01 EXIT: restore_rman
SBT-3330 01/14/13 20:31:01 sbtpvt_ntap_sr: restore_rman result:[0 - fail, 1 - pass(SAN[file],NAS(volume),SAN(volume)), 2 - pass(NAS[file])] : 0
SBT-3330 01/14/13 20:31:01 sbtpvt_ntap_sr: Restore not success for file:  in volname : bdisktest203 snapshotname BD2_20130114_2onvdv0b_6_1_lvl0A :
SBT-3330 01/14/13 20:31:01

Here the information is more useful and it's inmediately spotted such simple configuration error. Nevertheless my point was that the the 'restore' RMAN command didnt finish with an error, as I would have expected, but it was next 'recover' RMAN command who spotted a problem and failed.

I havent repeated my tests of datafile or tablespace but I believe they will work out well. I did see that while doing a backup of a tablespace a full volume snapshot was taken but only the affected file was registered in the proxy catalog. If I didnt miss a configuration variable I think it's a pity to not use the rest of the volume snapshot as space in the snap reservation  will be taken for the whole volume anyway.

I will test version 2. In fact we are very interested on MML for cluster mode  as we are moving to that version. Thanks for the presentationa and demo.

BR,

Ruben

nkarthik
8,606 Views

Good Ruben for testing V1.0. version 2.0 development is done. It's in QA phase. we will upload it as quick as possible.

Regards,

Karthikeyan.N

gasparuben
8,606 Views

Hi Karthikeyan,

I see v2 is online.

I wanted to know if we decide to deploy this solution on production (we have Oracle and Netapp support on our system), is this free?

Thanks for your reply,

Ruben

Public