Data Backup and Recovery

Snapvault transfers not working after upgrade from 3.6 to 4.1

pierrek
5,701 Views

Good morning All,

We recently upgraded SC from 3.6P1 to 4.1P1 following the documented procedure, but we encountered a number of small issues that we could fix ourselves, except for the snapvault update part that we could not make work as in the previous version.

Let me explain and show the config file before and after the upgrade:

-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise-before.conf | grep -v "^#"

NTAP_SNAPSHOT_RETENTIONS=hourly:6,daily:3

NTAP_SNAPSHOT_RETENTION_AGE=

NTAP_SNAPSHOT_CLEANUP=N

NTAP_SNAPSHOT_DISABLE=N

NTAP_SNAPSHOT_NODELETE=N

NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N

NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N

NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N

NTAP_SNAPVAULT_UPDATE=Y

NTAP_SNAPVAULT_RETENTIONS=daily:15

NTAP_SNAPVAULT_RETENTION_AGE=15

NTAP_SNAPVAULT_SNAPSHOT=N

NTAP_SNAPVAULT_NODELETE=N

NTAP_SNAPVAULT_RESTORE_WAIT=N

NTAP_SNAPVAULT_WAIT=30

NTAP_SNAPVAULT_MAX_TRANSFER=

In the GUI, this config looked this way:

As you see, in 3.6 we just archived daily snapshots to the secondary controller and kept 3 daily locally and 15 daily remotely.

After the update to 4.1P1, the config file was modified to use UPPERCASE retention policies and looked like this:

-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise_WITHOUT_HOURLY_CONFIGURE.conf | grep -v "^#"

NTAP_SNAPSHOT_POLICIES=

NTAP_SNAPSHOT_NODELETE=N

NTAP_SNAPSHOT_RETENTION_AGE=

NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N

NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N

NTAP_SNAPSHOT_LABEL=

NTAP_SNAPSHOT_CLEANUP=N

NTAP_SNAPSHOT_RETENTIONS=HOURLY:6,DAILY:3

NTAP_SNAPSHOT_CREATE_CMD01=

NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N

NTAP_SNAPSHOT_DISABLE=N

NTAP_SNAPVAULT_UPDATE=Y

NTAP_SNAPVAULT_NODELETE=N

NTAP_SNAPVAULT_RESTORE_WAIT=N

NTAP_SNAPVAULT_RETENTION_AGE=15

NTAP_SNAPVAULT_SNAPSHOT=N

NTAP_SNAPVAULT_MAX_TRANSFER=

NTAP_SNAPVAULT_WAIT=30

NTAP_SNAPVAULT_RETENTIONS=DAILY:15

When launched, the job failed with the below message:

[2014-04-29 09:03:00,229] DEBUG: Workflow : backup started with workflow id : 1511

[2014-04-29 09:03:00,229] DEBUG: Version: Snap Creator Framework 4.1P1

[2014-04-29 09:03:00,229] DEBUG: Profile: Oracle

[2014-04-29 09:03:00,229] DEBUG: Config: casewise

[2014-04-29 09:03:00,229] DEBUG: Action: backup

[2014-04-29 09:03:00,229] DEBUG: Plugin: oracle

[2014-04-29 09:03:00,229] DEBUG: Policy: hourly

[2014-04-29 09:03:00,229] DEBUG: Volume Name: O11G_CASEWISE_DATA,O11G_CASEWISE_RECO

[2014-04-29 09:03:00,229] DEBUG: Snapshot Name: CASEWISE-hourly_recent

[2014-04-29 09:03:00,304] INFO: Validating policy: hourly finished successfully

[2014-04-29 09:03:00,305] ERROR: SCF-00073: Policy [hourly] is not a defined secondary Snapshot copy retention policy in the configuration, Exiting!

[2014-04-29 09:03:00,310] DEBUG: Workflow : backup_OnFailure started with workflow id : 1512

########## Snap Creator Framework 4.1P1 failed ##########

[2014-04-29 09:03:00,334] INFO: Pre Exit commands are not defined. Skipping !

The only way we found to work this around was to select an HOURLY policy in the SNAPVAULT Policies tab, but leaving the retention on 0:

The corresponding change in the config file is included here:

-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise_WITH_HOURLY_CONFIGURE.conf | grep -v "^#"

NTAP_SNAPSHOT_POLICIES=

NTAP_SNAPSHOT_NODELETE=N

NTAP_SNAPSHOT_RETENTION_AGE=

NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N

NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N

NTAP_SNAPSHOT_LABEL=

NTAP_SNAPSHOT_CLEANUP=N

NTAP_SNAPSHOT_RETENTIONS=HOURLY:6,DAILY:3

NTAP_SNAPSHOT_CREATE_CMD01=

NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N

NTAP_SNAPSHOT_DISABLE=N

NTAP_SNAPVAULT_UPDATE=Y

NTAP_SNAPVAULT_NODELETE=N

NTAP_SNAPVAULT_RESTORE_WAIT=N

NTAP_SNAPVAULT_RETENTION_AGE=15

NTAP_SNAPVAULT_SNAPSHOT=N

NTAP_SNAPVAULT_MAX_TRANSFER=

NTAP_SNAPVAULT_WAIT=30

NTAP_SNAPVAULT_RETENTIONS=DAILY:15,HOURLY:0

-bash-4.2$

This way the job succeeds but a useless transfer of the latest hourly snapshot is executed every hour, before the job completes.

The previous version did not exhibit this behavior.

Can someone assist me with suggestions or show me what I did not understand?

Thanks

Pierre

9 REPLIES 9

spinks
5,701 Views

Pierre,

If I understand correctly, your problems are:

  • During upgrade policies changed from lowercase to uppercase
  • SnapVault now appears to force an upgrade during the hourly Snapshot when it did not previously. 
    • As a result you now have to set SnapVault retention hourly=0 or Snap Creator will fail.

Does that cover it?

I confirmed on my 4.1P1 system the policies are also in uppercase.  The GUI still shows in lowercase.  That seems to be correct behavior.

I'll have to ask about the SnapVault behavior...

Can you please let me know:

  • Did I miss any of the issues?
  • Did you upgrade straight from 3.6 to 4.1P1?  I assume yes, but want to verify.

Thanks!

John

ANDRIANAIVO
5,701 Views

Hi John,

we also have the same issue after upgrading straight from 3.6 to 4.1P1 and you described the problem correctly.

We set the SnapVault retention hourly=1 as a workaround.

Kr, Fidy

pierrek
5,701 Views

Hi John,

We made the upgrade directly from 3.6 to 4.1, using the procedure described in the install guide.

As I said, we encountered a number of smaller issues (passwords had to be reset for the users configured in our configs, DNS aliases for hosts where agents are deployed did not work anymore,...) but all those could be

fixed by ourselves.

So the description of the remaining issue is, I hope, complete:

After the execution of the java -jar snapcreator.jar -upgrade command, it appears that the config files were modified as shown here:

NTAP_SNAPSHOT_RETENTIONS=hourly:6,daily:3

NTAP_SNAPVAULT_RETENTIONS=daily:15

NTAP_SNAPVAULT_RETENTION_AGE=15

becomes:

NTAP_SNAPSHOT_RETENTIONS=HOURLY:6,DAILY:3

NTAP_SNAPVAULT_RETENTION_AGE=15

NTAP_SNAPVAULT_RETENTIONS=DAILY:15

Fine but when the job was started with the settings above, it failed in the validation phase stating that "Policy [hourly] is not a defined secondary Snapshot copy retention policy in the configuration"

So we found as only workaround to actually add HOURLY:0 (through the gui) as shown here:

NTAP_SNAPVAULT_RETENTIONS=DAILY:15,HOURLY:0

Now we have a snapvault update every hour, but at least the job is not failed. We want to avoid this useless transfer and come back to the behavior of 3.6

I am sure we're missing something here, but the documentation for the update procedure makes us think that it should be all transparent for existing backup jobs which is not the case.

Thanks in advance

Pierre

spinks
5,701 Views

Thanks Pierre and Fidy for the information.

I have sent a note to the development team to see if we can't find out what happened.

Will update when I get more.

John

pierrek
5,701 Views

Hi,

As I do not receive any update on this I will open a case because we are hitting other issues since the update to 4.1 , as reported by the customer:

The new SnapCreator agent is creating directories /tmp/pdk-root-process every time it does something and the directory is not removed. Currently /tmp on oradbprod3 is using 2.5Gb of disk which is mostly due to this problem. I'll put in a job to clean up old directories, but could you please log this as a bug with NetApp.


They already seem to know about it ...... https://communities.netapp.com/thread/33890- despite what is said in the thread, it is a problem due to the sheer number of directories created.


Best regards,


Dave


spinks
5,701 Views

Dave,

Please feel free to open a case - that should increase the urgency on this issue.

Our QA team reached out yesterday evening to confirm the SnapVault issue in 4.1 - I haven't heard the plan to address this yet. 

It should either be fixed in an upcoming patch or the maintenance release depending on the estimate to fix.

I don't have these details yet.

In regards to the pdk-root issue - this is a problem and it is being fixed in our next patch release.  This is expected at the end of this month last I heard.

There is already a BURT on the pdk-root issue - 807980

Hope this helps,

John

pierrek
5,701 Views

John,

The issue is actually getting worse as the customer contacted me to tell that SC does NOT delete the hourly snapshots transferred to the secondary system (yet the hourly SnapVault retention policy being set to 0), resulting in the 255 snapshot limit having been hit today.

SC 3.6 has been running for 2 almost 2 years without a glitch so we now believe we should not have upgraded. But we had to because of compatibility requirements with newer Ontap versions.

Pierre

spinks
5,701 Views

Pierre,

I just sent this to the QA team and asked them to escalate the priority of this fix.

Please mention this in your case as well.

I haven't got an answer on what release this will be fixed in yet. 

John

pierrek
5,701 Views

Thanks John,

I'll keep support informed of this.

Regards

Pierre

Public