Data Backup and Recovery

Single file restore problem on Domino with SnapCreator 4.0p1

CHRISTIANMINIBEK
5,792 Views

Hello,

I am using SnapCreator 4.0p1 for my Domino backup.

Environment Facts:
- Domino 8.5.3FP4 64bit on Windows Server 2008R2
- SnapCreatorAgent 4.0p1 64bit runs as Service
- Domino Data is stored on iSCSI LUN

- SnapCreatorServer 4.0p1 installed on Windows Server 2008R2, Running on Port 9080 because DFM is also installed on the same machine

I followed this restore operation steps => https://library.netapp.com/ecmdocs/ECMP1149421/html/html/GUID-334E9282-6824-4226-8ABC-A4F92ECA5BA3.html
My restore command was on SnapCreatorServer was =>
c:\Program Files\NetApp\Snap_Creator_Framework\scServer4.0p1>snapcreator.exe --server localhost --port 9080 --user sc_user --passwd xxxxxx --verbose --debug --action custom --profile Domino --config ATSMSVLD02 --params snapname=ATSMSVLD02-weekly_20131102230000 datapath=D:\IBM\Lotus\Domino\data\restore\archive_it\helpdesk.nsf restoretype=su2m restoretime="11/06/2013 14:00:00" disablereplication=Y

The only entry in the ATSMSVLD02.debug.20131106150032.log is "########## Custom plug-in action ##########". After a while the Domino Server crashes with an NSD or I have to kill the snapcreator process manually an then the server crashes. Then the debug log has some more infos:

########## Custom plug-in action ##########

[2013-11-06 15:15:49,060] ERROR: SCF-00050: Action custom for plugin [domino] failed with error [java.net.ConnectException: ConnectException invoking http://atsmsvld02.itc.global.mahle:9090/SnapCreator/Daemon/Agent: Connection refused: connect] and exit code [-1], Exiting!

########## Snap Creator Framework 4.0p1 failed ##########

[2013-11-06 15:15:49,076] INFO: Pre Exit commands are not defined. Skipping !

[2013-11-06 15:15:49,091] INFO: STORAGE-05016: Creating OM Event [script:critical-event] on [atsmsvfsmgmt01.itc.global.mahle].

[2013-11-06 15:15:49,091] DEBUG: <event-generate>

          <event-name>script:critical-event</event-name>

          <source>1</source>

          <event-message>SNAPCREATOR SCF-00050: Action custom for plugin [domino] failed with error [java.net.ConnectException: ConnectException invoking http://atsmsvld02.itc.global.mahle:9090/SnapCreator/Daemon/Agent: Connection refused: connect] and exit code [-1], Exiting!(Config: ATSMSVLD02 Name: ATSMSVLD02)</event-message>

</event-generate>

[2013-11-06 15:15:49,091] INFO: STORAGE-05017: OM Event [script:critical-event] on [atsmsvfsmgmt01.itc.global.mahle] created successfully.

I found this bug report => http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=695053

Tryed it and saw for a few minutes this output on the command window where I started snapcreator.exe -start-agent

0 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt

= 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc =

82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp

= 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D

000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tra

cePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, f

nc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D0

This was very fast scolling, after I killes the snapcreator process, domino crashed again.

File restore from snapshot allways works perfect, after a fixup of the restored file from snapshot, the database is readable. The problem only exists when working with selectable point in time.

On another Domino Server where I have the same configuration the restore works perfect!

Has anyone an idea where I have to look to solve the problem?

Is additional information needed?

9 REPLIES 9

spinks
5,792 Views

Christian,

You say the problem is only when working with selectable point in time - do you mean it works perfectly with up to the minute (restoretype=u2m)?

I would expect the answer is no, but I wanted to ask.

Can you confirm that you are running the Snap Creator Agent through the CLI when you are performing the restore operation?

This is an issue with Snap Creator 4.0 as you pointed out with the BURT.

The workaround in the BURT details this.

If you are running the Snap Creator Agent as a service you will need to stop the service and start the agent though the CLI.

You can do this by changing directories to the scAgent install directory and running the command snapcreator.exe --start-agent

You can then re-run the command in a separate command line window.

If you run with the restore action there is a message on screen indication that this is required and you must confirm that the scAgent is running as a service before continuing - I cannot recall if there is a message with the custom action.

This will not be a problem with Snap Creator 4.1 due to the redesigned agent.

Snap Creator 4.1 is expected to be released early January.

Please let me know if this resolves the situation.

Sorry for the trouble!

John

CHRISTIANMINIBEK
5,792 Views

Good Morning John!

Thanks for quick reply!

I confirm that I was running the SC-Agent in CLI.

To be honest, I have never tried u2m-restore. But I did it now:

Below the results from agent console and debug-logfile:

c:\Program Files\NetApp\Snap_Creator_Framework\scAgent4.0p1>snapcreator.exe -start-agent -verbose

[Thu Nov  7 07:07:46 2013] INFO: Starting NetApp Snap Creator Framework 4.0p1 Agent [single-threaded]

[Thu Nov  7 07:07:46 2013] DEBUG: Listening on built-in default port: 9090

[Thu Nov  7 07:07:49 2013] INFO: Snap Creator Framework Agent [single-threaded], running with pid 6828, is listening on port 9090 of all configured network interfaces

[Thu Nov  7 07:11:01 2013] INFO: Authorized request from [scServer@*]

[Thu Nov  7 07:11:01 2013] INFO: Authorized request from [scServer@*]

[1AAC:0002-1640] Media Recovery Replay (6142 MB): 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

[1AAC:0002-1640] 07.11.2013 07:14:47   Recovery Manager: Media Recovery complete for D:\IBM\Lotus\Domino\data\restore\archive_it\helpdesk.nsf, last update applied 06.11.2013 22:14:16.

[Thu Nov  7 07:14:48 2013] INFO: Restoring databases finished successfully

########## Custom plug-in action ##########

[2013-11-07 07:14:48,705] DEBUG: [atsmsvld02.itc.global.mahle:9090 (4.0.0.1)] domino::setENV finished successfully

[2013-11-07 07:14:48,705] DEBUG: [atsmsvld02.itc.global.mahle:9090 (4.0.0.1)] Restoring to current time.

[2013-11-07 07:14:48,705] DEBUG: [atsmsvld02.itc.global.mahle:9090 (4.0.0.1)] Calling dominocore::postRestoreToTime with arguments(C:\Program Files\IBM\Lotus\Domino\notes.ini,D:\IBM\Lotus\Domino\data\restore,N:\ChangeInfo\ATSMSVLD02-daily_20131105230000,-1,D:\IBM\Lotus\Domino\data\restore\archive_it\helpdesk.nsf,UP-TO-THE-MINUTE,N:\ChangeInfo\logs\)

[2013-11-07 07:14:48,705] TRACE: Notes API initialized

Applying Change info for:D:\IBM\Lotus\Domino\data\restore\archive_it\helpdesk.nsf

NSFRecoverDatabases returned successfully.

NSFDbReplicaInfoGet called successfully.

NSFDbReplicaInfoSet called successfully.

Replication disabled.

ERROR_CODE:0

[2013-11-07 07:14:48,705] INFO: [atsmsvld02.itc.global.mahle:9090 (4.0.0.1)] Restoring databases finished successfully

########## Custom handling for plugin domino completed successfully ##########

########## Snap Creator Framework 4.0p1 finished successfully ##########

[2013-11-07 07:14:48,721] INFO: INFO: Snap Creator Framework finished successfully (Action: custom)

[2013-11-07 07:14:50,297] INFO: STORAGE-05016: Creating OM Event [script:information-event] on [atsmsvfsmgmt01.itc.global.mahle].

[2013-11-07 07:14:50,297] DEBUG: <event-generate>

  <event-name>script:information-event</event-name>

  <source>1</source>

  <event-message>SNAPCREATOR INFO: Snap Creator Framework finished successfully (Action: custom)(Config: ATSMSVLD02 Name: ATSMSVLD02)</event-message>

</event-generate>

[2013-11-07 07:14:50,297] INFO: STORAGE-05017: OM Event [script:information-event] on [atsmsvfsmgmt01.itc.global.mahle] created successfully.

Because this was working well, I tried a su2m-restore with the same snapshot. It was also working.

Both cases where the latest existing snapshot of course.

After that I tried a su2m-restore from an older snapshot (5 day old snapshot, so that the logs are available) and had the same error as I described yesterday.

c:\Program Files\NetApp\Snap_Creator_Framework\scAgent4.0p1>snapcreator.exe -start-agent -verbose

[Thu Nov  7 08:06:51 2013] INFO: Starting NetApp Snap Creator Framework 4.0p1 Agent [single-threaded]

[Thu Nov  7 08:06:51 2013] DEBUG: Listening on built-in default port: 9090

[Thu Nov  7 08:06:54 2013] INFO: Snap Creator Framework Agent [single-threaded], running with pid 9632, is listening on port 9090 of all configured network interfaces

[Thu Nov  7 08:07:38 2013] INFO: Authorized request from [scServer@*]

[Thu Nov  7 08:07:38 2013] INFO: Authorized request from [scServer@*]

[25A0:0002-2604] comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc

= 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 co

mp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 10

23D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82,

tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11

, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D0000

00 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt

= 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc =

82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp = 11, fnc = 82, tracePt = 1023D000000 comp

So from my point of view there are only two options:

1. the older snapshot has a problem => I don't think this is the reason.

2. su2m-restore has a problem with recovering the archived logs from the ChangeInfo path.

Logging style from Domino is set to "Archived" in the directory t:\log.

Here is some config information regarding the archive-log-settings:

########################################################################################################################

#     Archive Options                                                                                                  #

########################################################################################################################

ARCHIVE_LOG_RECURSIVE_SEARCH=N

ARCHIVE_LOG_DIR=N:\ChangeInfo\logs

ARCHIVE_LOG_EXT=TXN

ARCHIVE_LOG_RETENTION=28.0

ARCHIVE_LOG_ENABLE=Y

########################################################################################################################

#     Plug-In Parameter                                                                                                #

########################################################################################################################

DOMINO_DATA_PATH=D:\IBM\Lotus\Domino\data

VALIDATE_DOMINO_DATA_PATH=N

Notes_ExecDirectory=C:\Program Files\IBM\Lotus\Domino

DOMINO_DATABASE_TYPE=0

DOMINO_CHANGE_INFO_PATH=N:\ChangeInfo

DOMINO_INI_PATH=C:\Program Files\IBM\Lotus\Domino\notes.ini

DOMINO_RESTORE_DATA_PATH=D:\IBM\Lotus\Domino\data\restore

LOTUS=C:\Program Files\IBM\Lotus

spinks
5,792 Views

Christian,

What style logging are you using on your Domino server?

It seems as if a necessary log file either is not present or doesn't contain the needed information.

This can happen if a log style other than archive is in use.

Thanks!

John

CHRISTIANMINIBEK
5,792 Views

Hello!

Logging style is set to archived of course. Retention for archived logs in SC is set to 21 days, whis is working well as far I cann see.

See information about logging settings at the end of my last message.

thanks

spinks
5,792 Views

Thanks - I just wanted to verify the Domino server settings.

I figured it would be archive, but I didn't want to assume.

You stated that on newest snapshot both u2m and su2m were successful.

But on a 5 day old snapshot su2m failed.

Did you also try u2m on the 5 day old snapshot?

More or less the same API calls are being used, so it shouldn't be a problem with the plugin itself, but something is certainly going on here.

Can you email me a scdump? (from the GUI click the action button and then scdump.  This will generate a zip file of logs and config files you can email).

I'd like to look into this a little deeper and see if I can't figure out what is going on.

If you don't want to send the scdump at the least I need to see a debug log with the error message and the engine.log file from the scServer.

my email is: spinks at netapp dot com.

If you happen to have a support case open as well please let me know (sometimes people post here and open a case).

Thanks,

John

CHRISTIANMINIBEK
5,792 Views

Hello John!

Tried now an u2m-restore with the older snapshot. Is was working on my test server. But had the same issue on productive server.

The difference between this two servers is, that the first one is a testserver without user activity, so the transaction log is not archived for a very long time - on this server only an agent is running in one application which creates a document every two minutes (just for restore testing). The other server, where restore is not working is a busy productive machine and logs are archived daily to ChangeInfo\logs folder.

I have mailed you the dumps.

Thanks!

CHMOELLER
5,792 Views

Hey guys,

sorry to wake the dead but I am experiencing the exact same problem with Snap Creator 4.1P1.

SnapCreator Server: Windows 2012

Domino Server: Windows 2008 R2

Domino Version: 9.0.1FP1

SnapCreator Server and Client Version: 4.1P1

When I choose the latest available snapshot, a su2m restore works just fine. However if I chose a snapshot that is older, the process fails. Or rather it hangs, SC reports nothing, some logfiles on the Domino server will fill up

very quickly but it never finishes. It will fill up a "logasio" Logfile under notes\data\IBM_TECHNICAL_SUPPORT\ very quickly with lines like these:

24.06.2014 07:32:023D000000
24.06.2014 07:32:02comp = 11, fnc = 81, probeid = 77463653, errcode = 5010, extsympt = 0065
24.06.2014 07:32:023A200000
24.06.2014 07:32:02comp = 11, fnc = 48, probeid = 31, errcode = 5010, extsympt = 006B
24.06.2014 07:32:023D000000
24.06.2014 07:32:02comp = 11, fnc = 81, probeid = 77463653, errcode = 5010, extsympt = 0065
24.06.2014 07:32:023A200000
24.06.2014 07:32:02comp = 11, fnc = 48, probeid = 31, errcode = 5010, extsympt = 006B
24.06.2014 07:32:023D000000
24.06.2014 07:32:02comp = 11, fnc = 81, probeid = 77463653, errcode = 5010, extsympt = 0065

and a "console" logfile is filled with lines similar as to what CHRISTIANMINIBEK posted.

These seem to be something produced by the utility to roll forward the logs. This goes on indefinately (or until the server crashes). During this time just as CHRISTIANMINIBEK describes

nothing happens in the SC CLI.

Now the interesting thing is, it does not necessarily have to be the latest snapshot, one from a day ago might work aswell. But then one from four, five, six days ago does not. This is a bit

confusing to me, the logfiles are still in the changeinfo and all.


Has anyone ever found out something about this?

spinks
5,792 Views

There is a BURT and case open by Christian.

This is being worked on by engineering at this time - We are hoping to resolve and deliver in a patch or maintenance release.

The fix is still in the works, so we don't have all the details.

Please feel free to open a case.  The open BURT is: 824863.

Sorry I don't have more details at this time, but it is actively being worked on.

Thanks,

John

CHMOELLER
5,792 Views

That is good to know, thanks for that information. Is a BURT a bug report? Is it correct that under the "Bug Tools" in NetApp Support I am getting no information for this ID (all fields in the table are empty) or am I looking in the wrong place?

Thanks again

Public