ONTAP Discussions

Volumes left behind after verification completes

BrendonHiggins
15,038 Views

We have two MS Exchange 2003 clusters, V01 & V02. Each cluster has two nodes and runs Windows 2003. Ent. Ed, with SnapDrive 4.2.1 and SME 3.2. Looking to upgrade soon but not this week...

We use a 2nd server to verify the backups as the server load is high on the Exchange boxes.

The problem is once the verification has finished the flexclone volumes are sometimes left behind. It is normally just the transaction log volume or the database volume. The problem tends to happen on V01 more frequently and we have completed a bare metal re-install of the cluster nodes to try and resolve the issue. – No success.

There are no events on the Filer logs (FAS3070 cluster) and I have not been able to work out a pattern as to when it happens. The work around is to check the filer each morning for the flexclone volumes and offline / delete them.

Does anyone have any ideas to a possible solution. I have heard ESEUTIL throttling can cause this issue but testing has show the problem happens regardless.

Thanks for taking the time to read this and think about the issue. Any help / ideas welcome.

1 ACCEPTED SOLUTION

stuartwatkins
14,232 Views

Anti virus? Do you have anti virus that is chewing on this and causing any problems? I have heard of this before

Worth a shot, remove any AV see if it makes a difference?

View solution in original post

24 REPLIES 24

sflynn
14,267 Views

Hi Brendon...

I'm not sure why SME isn't cleaning up these volumes after the verification is complete. It should delete the works and leave everything clean.

Is there any info in the backup reports for a particular verification job that gives any clue as to what's happening? Or even the event logs on the verification server...those would be helpful to look at as well.

I'll follow up with a few folks to see if this is a known issue or not. I'll post what I find probably next week.

Thanks...

Shannon

BrendonHiggins
14,267 Views

Luck would have it that the issue happened over the weekend.

SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0

The SME log is standard expect for this:

Operation completed successfully in 45.735 seconds.

Dismounting LUN C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk001 of Snapshot ...

The virtual disk may not be connected, because its mount point cannot be found.

(SnapDrive Error Code: 0xc0040221)

Re-trying to force dismounting LUN...

SnapManager will pause 70 seconds after force dismount, please wait...

SnapDrive failed to dismount the snapshot.

Error Code: 0xc0040221

The virtual disk may not be connected, because its mount point cannot be found.

Mounting Snapshot for LUN E of Computer GBDC01EXMBN02

Mount point directory

Snapshot will be mounted on subdirectory

This Snapshot is mounted as the drive .

Mount Snapshot succeeded.

RUNNING TRANSACTION LOG INTEGRITY VERIFICATION

Transaction log directory is located at:

C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk001\EXCHSRVR\MDBDATA\SG1\

Start running ESEUTIL on "C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk001\EXCHSRVR\MDBDATA\SG1\"...

Command: ["C:\Program Files\Exchsrvr\bin\eseutil.exe" /ml E00]

Checking...

Microsoft(R) Exchange Server Database Utilities

Version 6.5

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...

************

Nothing on the Windows server log

*************

Console Messages

Sun Sep 21 20:04:54 BST : LUN /vol/SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0/sg1db.lun unmapped from initiator group viaRP...

Sun Sep 21 20:05:04 BST : LUN /vol/SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0/sg1db.lun has been taken offline

Sun Sep 21 20:05:08 BST : Volume 'SnapDrive_gbdc01exmbf01db_clone_of_exchsnap__gbdc01exmbv01_09212008_190000__weekly_snapshot_0' has been set temporarily offline

BrendonHiggins
14,268 Views

Think I have found the solution.

I was getting alot of Plug & Play errors in the System Log Event, Event ID 257, see Microsoft Article with Hotfix:

http://support.microsoft.com/kb/924390

Will try this weekend and report back.

ianaforbes
14,269 Views

Did you ever find a resolution for this? It looks to be a bug because there are many people running into the sdame issue.

BrendonHiggins
13,716 Views

Long story short, no. Hotfix was not the solution and filer CPU @ 100% was not the cause. Still looking for the answer.

Bren

sflynn
14,269 Views

Hey Ian/Brendon...

Was there a case opened with NGS for this issue? I'm going to bring this up with the engineers and see what we can figure out. But a case number would be good for them to get more info on what's happening.

Thanks...

Shannon

BrendonHiggins
14,269 Views

I have 3 different case numbers from the last 12 months. Out of the office this week but should be able to dig them out next week.

cdlovejoy
14,270 Views

Is there any resolution on this yet? Also, I was wondering if it was a 64bit issue. We are experiencing the same issue and it's quite frustrating.

Thanks,

Carl.

sflynn
14,270 Views

64bit shouldn't have anything to do with volumes on the storage being left behind.

As soon as I get a case number that I can reference, I can do more investigating and see what I can figure out.

Thanks...

Shannon

BrendonHiggins
13,501 Views

Sorry been off playing with my new VMWare toys.

NetApp LOG # 2494469

NETAPP LOG # 2687346

NETAPP LOG # 2705972

NETAPP LOG # 2962044

I can not find the original ticket but the above give a good insight into what has been going on.

cdlovejoy
12,945 Views

I just opened a case 2 days ago so I'll let the tech assigned work on this before I go public. I don't want to think I do not have faith in the engineers. I was just curious if anyone had seen this prior and if so what the thoughts were. The reason I wondered about 64 bit is we have an identical server other than it's 32 bit but all OS/SQL/Netapp products are the same version level and we are not having issues with our 32 bit machine. As soon as I hear from the tech assigned to my case I'll post the solution/response.

Thanks,

Carl.

BrendonHiggins
12,945 Views

Just applied this hotfix. Will report back on the results.

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=223992

Merry Christmas all

cdlovejoy
12,945 Views

I'm sorry to report that I have already installed that hotfix. It did nothing for this issue on my box.

Thanks,

Carl.

Merry Christmas

stuartwatkins
14,233 Views

Anti virus? Do you have anti virus that is chewing on this and causing any problems? I have heard of this before

Worth a shot, remove any AV see if it makes a difference?

BrendonHiggins
13,497 Views

I can not believe that you would post my solution to my thread as your own work. You forum troll and points whor. But you a quite correct it was the AV which was casing the problem and you did make the coffee this morning so have some points.

Bren

ianaforbes
12,206 Views

If antivirus is the issue was removing antivirus the solution? What if we need A/V on the server? Ia there a workaround?

BrendonHiggins
12,205 Views

That is just what our risk team said. "There MUST be av on the server..."

Answer - Only run white list apps on the server.

http://www.mcafee.com/uk/enterprise/products/host_intrusion_prevention/host_intrusion_prevention_server.html

Will post back if we have any problems but early results are good.

cdlovejoy
12,205 Views

I'm glad you solved your problem, however, our issue still exists and we have no AV software installed. In fact, the only thing on the box itself is the OS and SQL 2005. No other software is present. We have system hardened the box but only via known OS configuration changes. This problem has been escalated and I have been working with Netapp for 2 days on this. We worked on it quite a bit yesterday and I'm hoping a solution is on the horizon.

Thanks,

Carl.

BrendonHiggins
12,205 Views

Have you tried to create a 2nd verification server on a vinila install? We have five different sql verification and none of them give us problems, it was only ever SME. So SMSQL should work. When the verification failes does snapdrive show the internal SCSI HDD as a veritual disk? See pic.

cdlovejoy
11,697 Views

Verification isn't the issue, at least, I don't believe it to be the issue. Verification does not fail but successfully completes, the issue is when SD tries to un-mount the volume. It crashes on the volume delete. It gives and error stating it can't find the volume mount point. I believe it to be an issue with the initiator group (we run FC). It creates a temporary rpc initiator group to mount the snapshot. This temporary initiator group isn't unmapping from the lun which in turn causes SD to not be able to disconnect the mount.

Thanks,

Carl.

Public