Data Backup and Recovery
Data Backup and Recovery
Anyone ever come across an issue where SMSQL reports a Error Parsing Time when select the Restore option from the left hand pane in SMSQL ?
SMSQL was running for a couple of months, no problem. Then one day the Restore option stopped working.
See attached screenshot.
SMSQL5.0R1P2_x86.exe.
Ticket has been opened with support.
Scheduled snapshots work great. You just cannot restore anymore.
Have you had any luck in resolving this? I've been working with support to try to get this resolved for quite a while now. Of 4 SQL servers, 1 gives this error without fail. The other 3 servers work just fine and have never encountered this error. Over the past 2 months at various times, sometimes the error shows 4 consecutive times, sometimes 8, sometimes 40+ (always when attempting to restore/clone from local backup sets). It had been showing the exact same error dialog as you've attached, but since the .NET 3.5 SP1, I'm also getting a slightly different dialog (attached) in addition to the usual error. For example, earlier this week, I'd see the exact error dialog you posted 4 times, followed by the one I've attached 4 times. Right now, it just shows the error I've attached 4 times and then SMSQL crashes. Most of the time this crashes SMSQL, while other times, I am able to proceed after the last error dialog.
Restored .NET 3.5SP1 by way of repair installation as instructed by support, but that was no help. Also upgraded SMSQL several times - first to SMSQL 5.0R1, then SMSQL 5.0R1P2, then SMSQL 5.0R1P2D2; none of these have corrected the issue. Took it upon myself to delete all snapshots on the 3 volumes (LDF, MDF, Snapinfo), as support is not eager to recommend this course of action. I can somewhat understand why, but the snapshots are completely useless to me if I cannot restore/clone from them - how much worse off am I if I have no snaps versus if I have snaps but they're not usable? After deleting all snaps, I ran a scheduled backup and was able to restore/clone with no error. However, the next morning, I was getting that same error repeating once again.
Like you, our SMSQL scheduled backups on this server appear to work just fine (logs indicate all is successfully backed up). However, more often than not, we are unable to restore or clone from backup sets created on that server.
We're using OnTap 7.3.2RC1, SQL Server 2005, and SnapDrive 6.1 on Server 2003. LUNs are connected via iSCSI. Support is now recommending to upgrade to SMSQL 5.1, but I am not very confident this will resolve the problem (no dice upgrading the last 3 times...)
Not sure if this was mentioned or if this will fix it but there is a burt 145575 that looks similar to your problem.
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=145575
Thanks watan, but I think this is a separate issue. I've never seen this come up as a "SystemTimeToFileTime" error. Also, I usually am not even able to get to the point where I am able to double-click on a backup set - the mmc crashes before this. I'm also not seeing anything logged in the app log at the time I recreate this issue other than event ID 1000 (Faulting application mmc.exe, version 5.2.3790.3959, faulting module msvcr80.dll, version 8.0.50727.3053, fault address 0x0001c992).
Stephen, What is your support case#?
Typically that error is nothing to worry about. It has to do with metadata in a transaction-log only backup. My understanding is that it can also happen on native SQL backups. Either way, the error itself is not a show stopper.
The bug mentioned in 145575 does not take into account that the MMC is crashing. Like Shawn's case above, the MMC is getting an unhandled exception and crashing. Perhaps we could gather some data to prove/disprove that you are seeing the same thing.
I'm not convinced that deleting snapshots will buy you anything, as the issue does not appear to be snapshot related.
The case number for this issue is 2001600022.
We have had no luck resolving this issue. The latest attempt was to make sure everything is patched the same on all of the nodes.
Mpio 3.2r1 - upgraded to 3.3.1
Snapdrive 6.0.2 - upgraded to 6.2p1
host utilities kit 4.0 - upgraded to 5.1
smsql stayed at 5.0r1p2
Ontap stayed at 7.3.2P3
Physical Server
Windows 2003 Enterprise Edition SP2 32bit Physical Server
SQL Server 2005
LUNs connected via Fiber Channel
Still we get the same error parsing time. We have deleted all the snapshots like you did and still the error persists. In my case the restores fail on any server that is running smsql for just one particular sql instance. We have also seen sometimes that when pressing the OK button multiple times that a window will appear saying unhandled exception occured in mmc.exe [5636]. See attachment. We even tried to enable smsql logging by flipping a registry key, however the log file did not indicate the error message either. What is interesting is that there is 0 errors in any of the event logs.
I can browse the backup images fine using the smsql cli prompt:
get-backup -server 'DB00' -ServerInstance 'DB00\SQL101' -d 'databasename' -vb
I did not attempt to perform a restore as I do not want to overwrite any of my existing production databases. I am trying to find out how to restore from cli 'databasename' to 'databasename_test'
The next step requested by support is to get Adplus and Procmon logs ftp'd to Netapp for analysis to see if we can pinpoint where it is crashing.
It looks like we're both at about the same point. I'll be collecting some data to send over to support tonight - already enabled SMSQL debugging, and will be running procmon and adplus this evening and sending logs over with the ONTAPWinDC output and SQL Server error logs. Will let you know how the progress goes from here.
The root cause of this problem is BURT 145575 (http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=145575). You have log backup which has NULL value in one of those backup headers, most of time, "BackupFinishDate" column. When you click the restore node in SMSQL MMC, if the log backup which has this problem belongs to a database sharing same LUN with other databases, you will get this error message. This is because SMSQL was retrieving backup header information from all log backups using SQL-DMO API during the backup enumeration. If it found a log backup which has a NULL value in one of those headers, it reports the error. After you get this error message, you should be able to proceed to select any backup for a restore. But you may or may not get another similar error stated in BURT 145575 after you double click a full backup for selecting to do the restore. That is because the log backup which has the issue may not belong to database you selected.
If you do get error while selecting a backup, you should still be able to do the restore with that log backup. The only problem is that you may not be able to use this log backup for point in time restore, since it does not have "BackupFinishDate" value, so SMSQL cannot determine the time range you can select for the restore.
So the issue is caused by a log backup. If you can find out which log backup caused the problem by running T-SQL restore headeronly command on all your log backups of a database (that is the database you know you will get an error message when double clicking it), you can check to see if there is a 3226 error event in SQL log during the time of the log backup described in MSFT KB http://support.microsoft.com/default.aspx?scid=kb;en-us;290787. If no, it is possible that you can get "BackupFinishDate" as NULL value even in normal condition. In that case, I would think this is when you want to open a case with MSFT to ask why the column is NULL, as SMSQL log backup is created by using simple SQL-DMO API call, nothing different than any other log backup created by any other tools.
Here is an example of restore headeronly output to a log backup which has the NULL column value:
BackupName BackupDescription BackupType ExpirationDate Compressed Position DeviceType UserName ServerName DatabaseName DatabaseVersion DatabaseCreationDate BackupSize FirstLsn LastLsn CheckpointLsn DifferentialBaseLsn BackupStartDate BackupFinishDate SortOrder CodePage UnicodeLocaleId UnicodeComparisonStyle CompatibilityLevel SoftwareVendorId SoftwareVersionMajor SoftwareVersionMinor SoftwareVersionBuild MachineName Flags BindingId RecoveryForkId Collation
--------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------------------------------------------------ ---------- -------- ---------- --------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------- --------------- ------------------------------------------------------ ---------------------- --------------------------- --------------------------- --------------------------- --------------------------- ------------------------------------------------------ ------------------------------------------------------ --------- -------- --------------- ---------------------- ------------------ ---------------- -------------------- -------------------- -------------------- --------------------------------------------------------------------------------------------------------------------------------- ----------- -------------------------------------- -------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------
NetAppLogBackup NULL 2 NULL 0 1 2 AMIT2K\Administrator KRISHNA-SVR8 DB1 539 2004-12-02 15:32:29.000 379860992 1841000000295400001 1892000001263000001 1880000000637000001 1880000000637000001 2004-12-06 10:40:44.000 NULL 52 228 1033 196609 80 4608 8 0 760 KRISHNA-SVR8 0 {C3133F25-7345-4A2C-AEEB-669F59285BFD} {C3133F25-7345-4A2C-AEEB-669F59285BFD} SQL_Latin1_General_CP1_CI_AS
(1 row(s) affected)
Thanks,
Qing
Thanks for chiming in Qing!
Update. I have uploaded a ADPlus core dump as well as Process Monitor log capture of all events for analysis. Hopefully this will get us started to pinpoint why the restore jobs are not appearing in the mmc/time parse error/unhandled exception in mmc.
It appears after further review that we might be exceeding the recommended number of databases in a single LUN. Page 95 of the SMSQL Admin guide states the following:
Header 1 | Header 2 |
---|---|
Databases per LUN | 35 |
Databases per storage system volume | 35 |
In our environment we have 66 data .mdf in LUN A and 66 log .ldf in LUN B. I have submitted a request to our DBA team to get approval to move half our of databases over to two new LUNs C and D so that our databases per LUN will be at 33 and under the Netapp recommended value of 35.
Keep you posted on our continuing efforts to resolve this issue.
Please let us know how things go after the DB's get moved and if it resolves the issue. Thanks for the update Shawn.
Ok, I think I've got this figured out on my setup. Like shawnj, I also had > 35 databases per LUN. However, it does NOT seem that this is the cause of the problem in my case (though I am aware that it is not a recommended configuration). The production volumes in question are snapmirrored to a remote location; I then created another set of volumes and snapmirrors to replicate this to a new set of volumes so that I could test without interfering with the production snapmirror destination volumes.
In testing, I created an additional set of volumes/LUNs in order to split up the databases so that I had < 35 databases per LUN. The databases I moved to new LUNs backed up without issue and allowed me to restore without issue. However, the databases that I left on the original set of LUNs continued to give a time parsing error when attempting to restore from the backup set. Looking closer at the databases on these LUNs, I realized that when I removed backups of Sharepoint databases, the restore worked fine (as long as no prior snaps existed that contained the Sharepoint databases). Prior to this, it hadn't clicked for me that this server was indeed running Sharepoint (as a component of Team Foundation Server), as it's at a remote site and I don't frequently have my hands on it. We do not have SMMOSS.
Having no prior exposure to SMMOSS helped in overlooking this also. Well, this was something that had slipped by us and our consultants at the time we installed our filers. SMMOSS product documentation indicates that SMMOSS is extremely critical in order to successfully restore Sharepoint databases. As soon as I re-included these WSS databases in the SMSQL backups, the time parsing error/SMSQL crashing returned. As soon as I removed them and the associated snapshots again, the errors stopped.
So, in order to resolve this on the production end, I eliminated all WSS databases from the SMSQL backup job. Once I had a few days worth of snaps with this backup config, I purged all old snapshots that had selected the WSS databases for backup. Immediately after doing so, I was/am able to get into the SMSQL restore wizard without any time parsing error dialogs and without the SMSQL mmc crashing. As long as this holds up, I'll FINALLY be able to close the case that I've had open on this issue for the past 3 months.
So, in my case, this issue had nothing to do with exceeding the limit of the recommended databases per LUN (though restructuring the databases so that we keep to this limit is in the works). Turns out it was something much more straightforward (and something I should have picked up on earlier) - that SMMOSS needs to be used in order to properly restore databases where the LUN contains WSS databases. SMSQL will back up these databases without any problem, but the problems start when attempting to restore any databases that are included on this backup set. As we do not have a license for SMMOSS, I have not been able to test this as a resolution to the problem, but I believe that it is a safe assumption at this point. We'll be backing up the Sharepoint data via the stdadm tool for the time being until we see if we can evaluate the SMMOSS product.
I'm on leave from 30th Oct to 12th Nov. Please contact my manager
Senthil (senthilk@netapp.com) for anything urgent.
Thanks,
Vrishali