ONTAP Discussions
ONTAP Discussions
We have two MS Exchange 2003 clusters, V01 & V02. Each cluster has two nodes and runs Windows 2003. Ent. Ed, with SnapDrive 4.2.1 and SME 3.2. Looking to upgrade soon but not this week...
We use a 2nd server to verify the backups as the server load is high on the Exchange boxes.
The problem is once the verification has finished the flexclone volumes are sometimes left behind. It is normally just the transaction log volume or the database volume. The problem tends to happen on V01 more frequently and we have completed a bare metal re-install of the cluster nodes to try and resolve the issue. – No success.
There are no events on the Filer logs (FAS3070 cluster) and I have not been able to work out a pattern as to when it happens. The work around is to check the filer each morning for the flexclone volumes and offline / delete them.
Does anyone have any ideas to a possible solution. I have heard ESEUTIL throttling can cause this issue but testing has show the problem happens regardless.
Thanks for taking the time to read this and think about the issue. Any help / ideas welcome.
Solved! See The Solution
Anti virus? Do you have anti virus that is chewing on this and causing any problems? I have heard of this before
Worth a shot, remove any AV see if it makes a difference?
Hi Brendon...
I'm not sure why SME isn't cleaning up these volumes after the verification is complete. It should delete the works and leave everything clean.
Is there any info in the backup reports for a particular verification job that gives any clue as to what's happening? Or even the event logs on the verification server...those would be helpful to look at as well.
I'll follow up with a few folks to see if this is a known issue or not. I'll post what I find probably next week.
Thanks...
Shannon
Luck would have it that the issue happened over the weekend.
SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0
The SME log is standard expect for this:
Operation completed successfully in 45.735 seconds.
Dismounting LUN C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk001 of Snapshot ...
The virtual disk may not be connected, because its mount point cannot be found.
(SnapDrive Error Code: 0xc0040221)
Re-trying to force dismounting LUN...
SnapManager will pause 70 seconds after force dismount, please wait...
SnapDrive failed to dismount the snapshot.
The virtual disk may not be connected, because its mount point cannot be found.
Mounting Snapshot for LUN E of Computer GBDC01EXMBN02
Snapshot will be mounted on subdirectory
This Snapshot is mounted as the drive .
RUNNING TRANSACTION LOG INTEGRITY VERIFICATION
Transaction log directory is located at:
C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk001\EXCHSRVR\MDBDATA\SG1\
Command: ["C:\Program Files\Exchsrvr\bin\eseutil.exe" /ml E00]
Microsoft(R) Exchange Server Database Utilities
Version 6.5
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode...
************
Nothing on the Windows server log
*************
Console Messages
Sun Sep 21 20:04:54 BST : LUN /vol/SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0/sg1db.lun unmapped from initiator group viaRP...
Sun Sep 21 20:05:04 BST : LUN /vol/SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0/sg1db.lun has been taken offline
Sun Sep 21 20:05:08 BST : Volume 'SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0' has been set temporarily offline
Think I have found the solution.
I was getting alot of Plug & Play errors in the System Log Event, Event ID 257, see Microsoft Article with Hotfix:
http://support.microsoft.com/kb/924390
Will try this weekend and report back.
Did you ever find a resolution for this? It looks to be a bug because there are many people running into the sdame issue.
Long story short, no. Hotfix was not the solution and filer CPU @ 100% was not the cause. Still looking for the answer.
Bren
Hey Ian/Brendon...
Was there a case opened with NGS for this issue? I'm going to bring this up with the engineers and see what we can figure out. But a case number would be good for them to get more info on what's happening.
Thanks...
Shannon
I have 3 different case numbers from the last 12 months. Out of the office this week but should be able to dig them out next week.
Is there any resolution on this yet? Also, I was wondering if it was a 64bit issue. We are experiencing the same issue and it's quite frustrating.
Thanks,
Carl.
64bit shouldn't have anything to do with volumes on the storage being left behind.
As soon as I get a case number that I can reference, I can do more investigating and see what I can figure out.
Thanks...
Shannon
Sorry been off playing with my new VMWare toys.
NetApp LOG # 2494469
NETAPP LOG # 2687346
NETAPP LOG # 2705972
NETAPP LOG # 2962044
I can not find the original ticket but the above give a good insight into what has been going on.
I just opened a case 2 days ago so I'll let the tech assigned work on this before I go public. I don't want to think I do not have faith in the engineers. I was just curious if anyone had seen this prior and if so what the thoughts were. The reason I wondered about 64 bit is we have an identical server other than it's 32 bit but all OS/SQL/Netapp products are the same version level and we are not having issues with our 32 bit machine. As soon as I hear from the tech assigned to my case I'll post the solution/response.
Thanks,
Carl.Just applied this hotfix. Will report back on the results.
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=223992
Merry Christmas all
I'm sorry to report that I have already installed that hotfix. It did nothing for this issue on my box.
Thanks,
Carl.
Merry ChristmasAnti virus? Do you have anti virus that is chewing on this and causing any problems? I have heard of this before
Worth a shot, remove any AV see if it makes a difference?
I can not believe that you would post my solution to my thread as your own work. You forum troll and points whor. But you a quite correct it was the AV which was casing the problem and you did make the coffee this morning so have some points.
Bren
If antivirus is the issue was removing antivirus the solution? What if we need A/V on the server? Ia there a workaround?
That is just what our risk team said. "There MUST be av on the server..."
Answer - Only run white list apps on the server.
Will post back if we have any problems but early results are good.
I'm glad you solved your problem, however, our issue still exists and we have no AV software installed. In fact, the only thing on the box itself is the OS and SQL 2005. No other software is present. We have system hardened the box but only via known OS configuration changes. This problem has been escalated and I have been working with Netapp for 2 days on this. We worked on it quite a bit yesterday and I'm hoping a solution is on the horizon.
Thanks,
Carl.
Have you tried to create a 2nd verification server on a vinila install? We have five different sql verification and none of them give us problems, it was only ever SME. So SMSQL should work. When the verification failes does snapdrive show the internal SCSI HDD as a veritual disk? See pic.
Verification isn't the issue, at least, I don't believe it to be the issue. Verification does not fail but successfully completes, the issue is when SD tries to un-mount the volume. It crashes on the volume delete. It gives and error stating it can't find the volume mount point. I believe it to be an issue with the initiator group (we run FC). It creates a temporary rpc initiator group to mount the snapshot. This temporary initiator group isn't unmapping from the lun which in turn causes SD to not be able to disconnect the mount.
Thanks,
Carl.