SME 7.2 Fails to Backup After LUN Issue

sbmmiller · ‎2017-04-28

Hello we had a LUN "disappear" (Used as a log drive, L: ) and after that happened SME stopped working. Even after Netapp support, we could never find the LUN and they told us to recreate a LUN. We created a new lun with the same drive letter but a different LUN name. Netapp wont help us with SME because of the support contract. Thanks. Anyways a task scheduler is set to run the SME jobs and the task is ok and starts off the jobs but the jobs fail to backup the databases.

This is the email we get:

“HA Group Notification from NETAPP (CLIENT APP ERROR Backup: SME Version 7.1: (111) on MBX: SnapManager for Exchange online backup failed. (Exchange 14.3.123.4) Error code: 0x80042306) WARNING”

Only thing that stands out is that we have SME 7.2 on the exchange server so why is it reporting 7.1 here?

The error I find in the log is:

[10:23:43.740] Error in calling VSS API: Error code = 0x80042306
Error description: VSS_E_PROVIDER_VETO

And I attached a log file I found for a failed job.

Any ideas?

matte · ‎2017-05-01

hi

VSS_E_PROVIDER_VETO is a generic error and you need to investigate more on the possile cause.
This means check the application event logs and the snapdrive logs at least...

- did you create the new lun using Snapdrive? If not please be sure that the list containst all the requested partitions (Primary partition and the MSR partition)
you can use diskpart to check that https://technet.microsoft.com/en-us/library/cc766465(v=ws.10).aspx

-   Disable the AntiVirus engine just to avoid that it is running and maybe put a VETO during the backup procedure

-   When a VSS framework is involved in a backup procedure all the vss components needs to be in a clean status otherwise the backup will fail.
     for that please check the output of the command "vssadmin list writers"

- Please check the Snapdrive logs and the MS event logs at the same time of the VSS error to see if you have some more information on the error.
   Try also to run a backup without including the "affected" lun

if that won't help i think the next step is to enable an collect a VSS trace...

sbmmiller · ‎2017-05-01

- Yes LUN was created with Snapdrive

- AV didnt seem to change things

- vssadmin list writers - no errors (After the change made below)

We deleted some old snapsnots that might have been causing the issue. Weird thing is that now, I am not seeing any new job reports or errors but its showing the last run was today. The last report is from last month.

Then suddenly the old volume shows up now but of course there is no LUN connecting to it.

matte · ‎2017-05-02

Who is "showing that the last run was today".. ? the windows task scheduler

Did you try to run a manual backup to see if it will create a new report in the installation folder?

did you find some information in the event viewer?

sbmmiller · ‎2017-05-02

I found this event after kicking off a job, does this mean it doesnt backup since the mailbox server is not the active mailbox?

Job : new-backup -Server 'USDAG' -ClusterAware -GenericNaming -ManagementGroup 'Daily' -RetainDays 4 -RetainUtmDays 2 -UseMountPoint -MountPointDir 'C:\SnapMgrMountPoint' -ActiveDatabaseOnly -BackupTargetServer USMAILBOX -RemoteAdditionalCopyBackup $True -RetainRemoteAdditionalCopyBackupDays 4 -AdditionalCopyBackupDAGNode USMAILBOX

The operation executed with the following results.
Details: new-backup cmdlet will exit as it is not running in the Active node : USMAILBOX
Stack Trace: at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input, Hashtable errorResults, Boolean enumerate)
at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)
at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)

I would think it would still produce some kind of report in SME but it does not.

matte · ‎2017-05-02

Hi

In the case of DAG, if a job is scheduled with -ClusterAware , the job runs only if the host in which it is scheduled is the active node of the DAG

sbmmiller · ‎2017-05-31

Hello the problem is back again. If the job runs only on the host in which it is scheduled, then how is it truncating the logs on the server which is not active? In this case, the active node the jobs are running fine but on the inactive node the log drive is filling up again and SME is failing.

NetApp Release 8.2.4P4 7-Mode

Error code: 0xc00413c

sbmmiller · ‎2017-06-01

Nevermind it wasnt an issue with SME rather one of the servers in the DAG the Echange services were stopped.

sbmmiller · ‎2017-05-03

So it appears the snapshots from the disappearing LUN were causing the issue. Not sure why but after creating the new volume and new lun, the old volume showed up again in Netapp. We took it offline.

The lun with the issue was the LOGS drive. We did a robocopy of the snapshot of the missing lun being used as the logs drive over to the new lun we created. We deleted the snapshots of the old lun. At this point we stopped getting backup failure alerts. We were finally confirmed successful backups when we switched the primary exchange server over to the failover exchange server. This confirmed SME was now backing up ok.