I have a client who has a failry large Exchange environment. They have 6 different databases with 3 mailbox database servers, 2 CAS, and 2 HUB servers in each location. Log directories and EDB directories are all on different volumes for each database.
Now, my higher level administrators have it setup so that backups are done via snapshots and stored in a different folder on the log volume. Which is not an issue, I know it will get rather large, but it seems to be working. My problem is, they have created a separate backup job for log truncation of the exchange transaction logs. This backup job runs, but the log files are never removed from the directory. There are log files that are still there from 11/9/12 and there are over 112,000 log files. These log volumes keep filling up and the other admins seem to think it is because of the snapshots being done. While that may be true, it also does not help to have 350GB of Exchange Log files. Their solution is to keep expanding the volume. This has been done over 4 times now and a single database has a log volume of 1TB (which is a little ridiculous if you ask me), the snapshots are taking up 350GB of this space.
The SME says the truncation job runs, but no log files are ever removed (or they aren't being marked as backed up). The "Return Code" for the truncation job is 0xffffffff, and I don't know what that means.
I'm not sure why they would want to have separate jobs for backing up and truncating logs. That doesn't make much sense to me but maybe there's something I'm missing. We have a very similar exchange environment that we backup with SME. We have one job that backups all of the databases and truncates the logs, then a second job for verification. Using the differnt folder in the same folder as the log volume makes sense. This should be the snapinfo directory and this is done so that exchange can use hard links which makes truncating logs faster but there not taking advantage of that. The 0xffffffff that you seeing means that there is a failure of some sort not very specific though. I have a couple questions and a few suggestions.
What version of Exchange are they running?
Is it necessary to have the truncation jobs and backup jobs separate?
Could you rerun the truncation job and post any errors from the event viewer and or the report for that job within SME?
First thing that I would do is rerun the configuration wizard for SME and make sure that there isn't any issue there. If there's nothing wrong there I would try deleting and recreating the truncation job that you have. There wasn't anything that I saw missing from above that would stop the logs from truncating.
I'm ok with that as 7 days of logs won't fill up the volume as much as a months worth would, but right now that's not even working. And yes, there is a SnapInfo directory on the log volume separate from the log directory.
All servers are running Exchange 2010.
The 2 different jobs are because the Truncate job only runs once a week on Fridays. Sat-Thur is a daily backup w/no Truncation.
I looked back through the event log and found the event in the application log that corresponds with last Fridays attempt to run the job. This is the error it gave:
new-backup -Server 'DAG-Exchange' -ClusterAware -GenericNaming -ManagementGroup 'Daily' -BackupTruncatedLogs $False -RetainDays 90 -RetainUtmDays 7 -StorageGroup 'Common\HWD-EXMB02V','Administration\HWD-EXMB02V','Ops\HWD-EXMB02V','Service\HWD-EXMB01V','FinancialServices\HWD-EXMB02V' -UseMountPoint -MountPointDir 'C:\MountPoints_Verify' -ActiveDatabaseOnly -RemoteAdditionalCopyBackup $True -RetainRemoteAdditionalCopyBackupDays 90 The operation executed with the following results.
Details: new-backup cmdlet will exit as it is not running in the Active node : HWD-EXMB02V
Looking at the mounted database in Exchange, they seem to be all on the HWD-EXMB02V server, this job is running from the HWD-EXMB01V server. I'm assuming that's the problem? The main problem database is the Service database. And I notice that it is the only one pointing to a different server. I don't know why it is....
I logged into MB02 and that job doesn't have an error code in the last run field for the Truncation job, the other job however, does.
Should each server have their own jobs setup with SME and creating snapshots of their own volumes and doing their own truncation? So the server names in each of the backup scripts be local to the server hosting that database?
1) When you say truncate job that runs on Friday: This is your full backup job?
2) Sat-Thurs daily backup w/o Truncation: Did you configure just a copy backup job?
Looking at your backup job and error that you have supplied-
HWD-EXMB01V is the DAG owner, so when the backup job kicks off based on the schedule it will start and automatically exit on the node that is not the owner. I.e On the server HWD-EXMB02V backup job will start and exit. This is what you are seeing in your environment and is normal behavior.
Now your backup job is taking backup of following databases -'Common\HWD-EXMB02V','Administration\HWD-EXMB02V','Ops\HWD-EXMB02V','Service\HWD-EXMB01V','FinancialServices\HWD-EXMB02V' . When the backup job runs on the node "HWD-EXMB01V" as its the DAG owner the job will run successfully but only perform backup of the databases if they are "active" on the specified server. I.e Common\HWD-EXMB02V','Administration\HWD-EXMB02V' - If Common and Administrative databases are active on MB02V they will get backed up, if not backup will just skip for them. If you look at your backup job, do you see a message saying "Skipping backup"?
In this example your database called 'Service\HWD-EXMB01V' seems to be mounted on EXMB02V, so it will skip the backup as your back job expects it to be active on 'HWD-EXMB01V'.
First thing I will do is to remove "-ActiveDatabaseOnly" switch from your backup job and then try running it again. And check the backup logs, now you shouldn't see any skipping backup for "services" database.
Ok, I think I got it now. I checked the other databases that don't have volumes filling up and they only have log files from the past 6 days so far, which is correct as the Truncation job only runs on Fridays.
I didn't configure these jobs so I'm not sure if the Sat-Thurs jobs are only Copy Jobs or not. I just know by the name and log files in existence that they aren't truncating the logs. The Service database is the only one I seem to be having a problem with. And that is the only one that is different between all these backup scripts.
So, is all I have to do is to change Service\HWD-EXMB01V to Service\HWD-EXMB02V on the HWD-EXMB02V server backup job script since that is where the service EBD is currently mounted?
Yes, if you change Service\HWD-EXMB01V to Service\HWD-EXMB02V it will fix your issue. Also you will have to change the job on both the servers to keep it consistent, as this job will start running on MB02 the moment it becomes the DAG owner.
But in future if any other databases do failover to other nodes their backup will skip. I will suggest to remove the switch "ActiveDatabaseOnly", so that it shouldn't matter if the database copy is active or passive.