Thanks for the good info, Andy. I was coming from SnapDrive 6.4.2 and SMSQL 5.2, so this is our first jump into SMSQL 6.x. Your suggestion to go through support (in addition to the information you've provided) seems like a logical step to try to reign this in, so I will likely try that route. To answer your other questions: We do use VSC for VM/Datastore backups, but not during the timing in question, so I don't think that would be of impact here. It's only the SMSQL backups that are occuring on the server at this time from everything I can tell. I took a look through the SQL Agent jobs on the box and didn't see anything that looked like it'd be getting in the way. We're utilizing SnapMirror replication upon SMSQL job completion. I hand't realized the lack of testing between SnapDrive 6.5 and vSphere 5.5. I can't move to SDW 7.0 as of yet, as I believe the matrix says I need to be on Data ONTAP 8.2 for this to be supported (we're currently on 8.1.2). Thanks again for your feedback on this!
... View more
I recently upgraded SnapDrive/SnapManager on a Windows Server 2008 R2 SP1 host to SnapDrive 6.5 and SnapManager 6.0.1.1431. At the time I upgraded, supposedly there were no other changes being made to the database server. The server is a VMware virtual machine on vSphere 5.5 and the NetApp solution is a 3220 HA pair running ONTAP 8.1.2 7-Mode. The server is running SQL Server 2008 R2 SP2. I believe this configuration is supported according to the interop matrix. Prior to updating this software, the hourly full SMSQL backups we were taking would take 3-5 minutes to complete. Post upgrade, I'm seeing different behavior (most of the time, at least). The syntax of my job is as shown below; I had to change it from what it was prior to the upgrade to get it to correctly pick up both SQL instances. From reading the documentation, it seems that the "new-backup -srv 'SERVERNAME' -d 'SERVERNAME' etc,etc,etc" should back up all instances on the server. This was not the case post-upgrade, and only the default instance was getting backed up. So, I changed it to what is shown below. NEW JOB SYNTAX: new-backup –svr 'SERVERNAME' -d 'SERVERNAME', '0', 'SERVERNAME\NAMEDINSTANCE', '0' -RetainBackupDays 10 -RetainShareBackupDays 0 -cpylgbkshare NOTHING_TOSHARE -lb -bksif -RetainSnapofSnapInfoDays 10 -rudays 10 -updmir –mgmt standard Old job syntax (pre software upgarde) for comparison - this job ran fine under the old software versions: new-backup –svr 'SERVERNAME' -RetainBackupDays 10 -lb -bksif -RetainSnapofSnapInfoDays 10 -rudays 10 -updmir –mgmt standard This new syntax seems to work and backs up the databases successfully. However, the time that the job takes to complete (backing up the same databases as with the old SnapDrive/SnapManager software versions) has increased significantly, but not every time it is run. Now, when I run SMSQL backups called via a scheduled task, jobs that run between 7am and 1 am complete in 11-12 minutes. Jobs that run between 2 am and 6 am run in 2-3 minutes. Yes, certainly you'd be right to point out that when the job runs quickly, it is during off-peak hours. However, there's nothing I'm aware of that is going on between 10 pm and 1 am (just for a simple example) that should be hitting the server any harder than between 2 am and 6 am. And again, I never once saw this issue prior to upgrading the SnapDrive/SnapManager packages. This excess time the jobs take seems to be due to SMSQL not retrieving the SQL Server database information successfully. When the jobs take longer to run, I see the event timeline display similar to the following: 10:00:01 am - Event 308 logged - "SnapManager for SQL Server per-server license is licensed on server SERVERNAME" 10:10:22 am (This is the next SMSQL event logged) - Event 368 logged - "SQL Server database information was retrieved successfully." From that point on things start moving in the exected time frame When the jobs run as I would have initially expcected (from 2-6 am) I see that the database information is retrieved in a much more timely manner: 6:00:03 am - Event 308 logged - "SnapManager for SQL Server per-server license is licensed on server SERVERNAME" 6:00:21 am (This is the next SMSQL event logged) - Event 368 logged - "SQL Server database information was retrieved successfully." The job then completes within a couple of minutes. I'm trying to figure out why I'm getting this 10 minute delay on the backups between 7 am and 1 am. Someone from our DBA group is also seeing that the query/(ies) coming from SMSQL on the delayed jobs seem to hit the server harder during this time frame, but they haven't been able to pinpoint anything other than to say the queries are taking a long time to complete. Due to the nature/source of this issue, other SMSQL tasks are impacted as well (for example, database cloning [as one might expect] since we seem to be having issues retrieving the databases in a timely manner). Does anyone have any ideas as how I might go about further troubleshooting this to get the time it takes to retrieve the SQL server database information back to a reasonable range so that I don't hit a 10 minute delay on any SMSQL task I run between the hours of 7 am and 1 am? Any suggestions would be greatly appreciated, thanks!
... View more
I don't think you need Data Motion in order to do this. I am trying to run through the same process and couldn't figure out why the Migrate option was grayed out for me. Turns out that despite my thinking that the source volume (i.e. original SnapMirror destination) met the volume migration requirements, it actually did not. The volume migration requirements are as follows: To enable volume migration as a task in a space management plan for an aggregate, the Secondary Space Management wizard must discover potential destination aggregates that meet the volume migration requirements. Source volume requirements Migration source volumes must meet all of the following criteria: No export protocols are used; for example, no NFS, CIFS, iSCSI, or Fibre Channel protocol. The volume has no clone volumes. All protection relationships on the volume are managed by Protection Manager. Destination aggregate requirements Migration destination aggregates must meet all of the following criteria: Reside on a storage system that meets the necessary license requirements to support the protection policy. Reside on a storage system that meets the secondary or tertiary storage provisioning policy requirements. Reside on the same storage system as the source volume if the source volume is attached to a vFiler unit. Have enough space to accommodate the migrated volume. In my case, after pulling out some hair (which is already in low supply), I realized that my source volume had an NFS export, thus was not migration capable. As soon as I removed that NFS export on that original Snapmirror destination, I was able to proceed and I have now just used the secondary space management to migrate that volume to another aggregate on the same controller. Hope this helps.
... View more
Looks like I got this resolved - I followed these steps: Stopped SnapDrive and SnapDrive Management services. I also stopped the SnapManager service (using SMSQL on this box). Uninstalled SnapDrive 6.3P2 via vSphere console as opposed to RDP Renamed c:\program files\netapp\snapdrive folder, as files still existed there after the uninstall completed. Reinstalled SnapDrive 6.3P2. I did not enter any settings for the ESX connection. After the reinstall finished, I was able to open SnapDrive and see my LUNs. At this point I still had not restarted the SnapManager service (SnapDrive services started on their own upon completion of the installation). Restarted SnapManager service and found I was still able to successfully get to my LUNs in SnapDrive. These are essentially the same steps I went through with support (which didn't work), but the differences are that in the above instance, I ran through this from the console (as opposed to RDP) and I stopped the SnapDrive service prior to uninstalling.
... View more
Hi Brent, Just wondering how you wound up resolving this (hopefully you're not still waiting/praying!)... I'm facing the same issue and have a ticket created to try to work through it, but so far haven't gotten anywhere (uninstall/reinstall SD several times going through different SnapDrive versions up to 6.3P2 - same issue across the different versions). This is on Server 2008 R2 - had been working fine for months (created/resized several LUNs with no problems at all), now it just sits at Establishing Connetion. Any pointers you may be able to give would be GREATLY appreciated! Interestingly, the SMSQL jobs that run on this machine (which leverage SnapDrive) are running ok, though unless I schedule a task to restart the SnapManager service, the job will hang. That behavior started about a month ago. I'd venture to guess that the SnapDrive issue started at the same time, though I'm not in it often enough to be sure of that. I'm also aware this is an old thread, but this is really the only thing I turn up online when trying to find a solution. Thanks!
... View more
Ok, I think I've got this figured out on my setup. Like shawnj, I also had > 35 databases per LUN. However, it does NOT seem that this is the cause of the problem in my case (though I am aware that it is not a recommended configuration). The production volumes in question are snapmirrored to a remote location; I then created another set of volumes and snapmirrors to replicate this to a new set of volumes so that I could test without interfering with the production snapmirror destination volumes. In testing, I created an additional set of volumes/LUNs in order to split up the databases so that I had < 35 databases per LUN. The databases I moved to new LUNs backed up without issue and allowed me to restore without issue. However, the databases that I left on the original set of LUNs continued to give a time parsing error when attempting to restore from the backup set. Looking closer at the databases on these LUNs, I realized that when I removed backups of Sharepoint databases, the restore worked fine (as long as no prior snaps existed that contained the Sharepoint databases). Prior to this, it hadn't clicked for me that this server was indeed running Sharepoint (as a component of Team Foundation Server), as it's at a remote site and I don't frequently have my hands on it. We do not have SMMOSS. Having no prior exposure to SMMOSS helped in overlooking this also. Well, this was something that had slipped by us and our consultants at the time we installed our filers. SMMOSS product documentation indicates that SMMOSS is extremely critical in order to successfully restore Sharepoint databases. As soon as I re-included these WSS databases in the SMSQL backups, the time parsing error/SMSQL crashing returned. As soon as I removed them and the associated snapshots again, the errors stopped. So, in order to resolve this on the production end, I eliminated all WSS databases from the SMSQL backup job. Once I had a few days worth of snaps with this backup config, I purged all old snapshots that had selected the WSS databases for backup. Immediately after doing so, I was/am able to get into the SMSQL restore wizard without any time parsing error dialogs and without the SMSQL mmc crashing. As long as this holds up, I'll FINALLY be able to close the case that I've had open on this issue for the past 3 months. So, in my case, this issue had nothing to do with exceeding the limit of the recommended databases per LUN (though restructuring the databases so that we keep to this limit is in the works). Turns out it was something much more straightforward (and something I should have picked up on earlier) - that SMMOSS needs to be used in order to properly restore databases where the LUN contains WSS databases. SMSQL will back up these databases without any problem, but the problems start when attempting to restore any databases that are included on this backup set. As we do not have a license for SMMOSS, I have not been able to test this as a resolution to the problem, but I believe that it is a safe assumption at this point. We'll be backing up the Sharepoint data via the stdadm tool for the time being until we see if we can evaluate the SMMOSS product.
... View more
It looks like we're both at about the same point. I'll be collecting some data to send over to support tonight - already enabled SMSQL debugging, and will be running procmon and adplus this evening and sending logs over with the ONTAPWinDC output and SQL Server error logs. Will let you know how the progress goes from here.
... View more
Thanks watan, but I think this is a separate issue. I've never seen this come up as a "SystemTimeToFileTime" error. Also, I usually am not even able to get to the point where I am able to double-click on a backup set - the mmc crashes before this. I'm also not seeing anything logged in the app log at the time I recreate this issue other than event ID 1000 (Faulting application mmc.exe, version 5.2.3790.3959, faulting module msvcr80.dll, version 8.0.50727.3053, fault address 0x0001c992).
... View more
Have you had any luck in resolving this? I've been working with support to try to get this resolved for quite a while now. Of 4 SQL servers, 1 gives this error without fail. The other 3 servers work just fine and have never encountered this error. Over the past 2 months at various times, sometimes the error shows 4 consecutive times, sometimes 8, sometimes 40+ (always when attempting to restore/clone from local backup sets). It had been showing the exact same error dialog as you've attached, but since the .NET 3.5 SP1, I'm also getting a slightly different dialog (attached) in addition to the usual error. For example, earlier this week, I'd see the exact error dialog you posted 4 times, followed by the one I've attached 4 times. Right now, it just shows the error I've attached 4 times and then SMSQL crashes. Most of the time this crashes SMSQL, while other times, I am able to proceed after the last error dialog. Restored .NET 3.5SP1 by way of repair installation as instructed by support, but that was no help. Also upgraded SMSQL several times - first to SMSQL 5.0R1, then SMSQL 5.0R1P2, then SMSQL 5.0R1P2D2; none of these have corrected the issue. Took it upon myself to delete all snapshots on the 3 volumes (LDF, MDF, Snapinfo), as support is not eager to recommend this course of action. I can somewhat understand why, but the snapshots are completely useless to me if I cannot restore/clone from them - how much worse off am I if I have no snaps versus if I have snaps but they're not usable? After deleting all snaps, I ran a scheduled backup and was able to restore/clone with no error. However, the next morning, I was getting that same error repeating once again. Like you, our SMSQL scheduled backups on this server appear to work just fine (logs indicate all is successfully backed up). However, more often than not, we are unable to restore or clone from backup sets created on that server. We're using OnTap 7.3.2RC1, SQL Server 2005, and SnapDrive 6.1 on Server 2003. LUNs are connected via iSCSI. Support is now recommending to upgrade to SMSQL 5.1, but I am not very confident this will resolve the problem (no dice upgrading the last 3 times...)
... View more