2014-04-29 02:45 AM - edited 2015-12-18 12:23 AM
Good morning All,
We recently upgraded SC from 3.6P1 to 4.1P1 following the documented procedure, but we encountered a number of small issues that we could fix ourselves, except for the snapvault update part that we could not make work as in the previous version.
Let me explain and show the config file before and after the upgrade:
-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise-before.conf | grep -v "^#"
In the GUI, this config looked this way:
As you see, in 3.6 we just archived daily snapshots to the secondary controller and kept 3 daily locally and 15 daily remotely.
After the update to 4.1P1, the config file was modified to use UPPERCASE retention policies and looked like this:
-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise_WITHOUT_HOURLY_CONFIGURE.conf | grep -v "^#"
When launched, the job failed with the below message:
[2014-04-29 09:03:00,229] DEBUG: Workflow : backup started with workflow id : 1511
[2014-04-29 09:03:00,229] DEBUG: Version: Snap Creator Framework 4.1P1
[2014-04-29 09:03:00,229] DEBUG: Profile: Oracle
[2014-04-29 09:03:00,229] DEBUG: Config: casewise
[2014-04-29 09:03:00,229] DEBUG: Action: backup
[2014-04-29 09:03:00,229] DEBUG: Plugin: oracle
[2014-04-29 09:03:00,229] DEBUG: Policy: hourly
[2014-04-29 09:03:00,229] DEBUG: Volume Name: O11G_CASEWISE_DATA,O11G_CASEWISE_RECO
[2014-04-29 09:03:00,229] DEBUG: Snapshot Name: CASEWISE-hourly_recent
[2014-04-29 09:03:00,304] INFO: Validating policy: hourly finished successfully
[2014-04-29 09:03:00,305] ERROR: SCF-00073: Policy [hourly] is not a defined secondary Snapshot copy retention policy in the configuration, Exiting!
[2014-04-29 09:03:00,310] DEBUG: Workflow : backup_OnFailure started with workflow id : 1512
########## Snap Creator Framework 4.1P1 failed ##########
[2014-04-29 09:03:00,334] INFO: Pre Exit commands are not defined. Skipping !
The only way we found to work this around was to select an HOURLY policy in the SNAPVAULT Policies tab, but leaving the retention on 0:
The corresponding change in the config file is included here:
-bash-4.2$ egrep "NTAP_SNAPVAULT|NTAP_SNAPSHOT" casewise_WITH_HOURLY_CONFIGURE.conf | grep -v "^#"
This way the job succeeds but a useless transfer of the latest hourly snapshot is executed every hour, before the job completes.
The previous version did not exhibit this behavior.
Can someone assist me with suggestions or show me what I did not understand?
2014-04-29 04:30 AM
If I understand correctly, your problems are:
Does that cover it?
I confirmed on my 4.1P1 system the policies are also in uppercase. The GUI still shows in lowercase. That seems to be correct behavior.
I'll have to ask about the SnapVault behavior...
Can you please let me know:
2014-04-29 04:41 AM
we also have the same issue after upgrading straight from 3.6 to 4.1P1 and you described the problem correctly.
We set the SnapVault retention hourly=1 as a workaround.
2014-04-29 04:49 AM
We made the upgrade directly from 3.6 to 4.1, using the procedure described in the install guide.
As I said, we encountered a number of smaller issues (passwords had to be reset for the users configured in our configs, DNS aliases for hosts where agents are deployed did not work anymore,...) but all those could be
fixed by ourselves.
So the description of the remaining issue is, I hope, complete:
After the execution of the java -jar snapcreator.jar -upgrade command, it appears that the config files were modified as shown here:
Fine but when the job was started with the settings above, it failed in the validation phase stating that "Policy [hourly] is not a defined secondary Snapshot copy retention policy in the configuration"
So we found as only workaround to actually add HOURLY:0 (through the gui) as shown here:
Now we have a snapvault update every hour, but at least the job is not failed. We want to avoid this useless transfer and come back to the behavior of 3.6
I am sure we're missing something here, but the documentation for the update procedure makes us think that it should be all transparent for existing backup jobs which is not the case.
Thanks in advance
2014-04-29 04:54 AM
Thanks Pierre and Fidy for the information.
I have sent a note to the development team to see if we can't find out what happened.
Will update when I get more.
2014-05-08 05:39 AM
As I do not receive any update on this I will open a case because we are hitting other issues since the update to 4.1 , as reported by the customer:
The new SnapCreator agent is creating directories /tmp/pdk-root-process every time it does something and the directory is not removed. Currently /tmp on oradbprod3 is using 2.5Gb of disk which is mostly due to this problem. I'll put in a job to clean up old directories, but could you please log this as a bug with NetApp.
They already seem to know about it ...... https://communities.netapp.com/thread/33890- despite what is said in the thread, it is a problem due to the sheer number of directories created.
2014-05-08 05:58 AM
Please feel free to open a case - that should increase the urgency on this issue.
Our QA team reached out yesterday evening to confirm the SnapVault issue in 4.1 - I haven't heard the plan to address this yet.
It should either be fixed in an upcoming patch or the maintenance release depending on the estimate to fix.
I don't have these details yet.
In regards to the pdk-root issue - this is a problem and it is being fixed in our next patch release. This is expected at the end of this month last I heard.
There is already a BURT on the pdk-root issue - 807980
Hope this helps,
2014-05-08 06:13 AM
The issue is actually getting worse as the customer contacted me to tell that SC does NOT delete the hourly snapshots transferred to the secondary system (yet the hourly SnapVault retention policy being set to 0), resulting in the 255 snapshot limit having been hit today.
SC 3.6 has been running for 2 almost 2 years without a glitch so we now believe we should not have upgraded. But we had to because of compatibility requirements with newer Ontap versions.
2014-05-08 06:48 AM
I just sent this to the QA team and asked them to escalate the priority of this fix.
Please mention this in your case as well.
I haven't got an answer on what release this will be fixed in yet.