ONTAP Discussions
ONTAP Discussions
That's a long title!
Okay, I was with a customer today and encountered a bit of a problem with this. Versions of the software as follows...
SME 4.0
SnapDrive 4.2.1
OnTap 7.2.4
FlexClone licensed
Everything is FC connected. Exchange is a 3 node cluster with 2 active nodes. DR verification server is part of a 3 node cluster also, with Exchange binaries installed, but not configured into the cluster (waiting for failover).
I didn't setup the storage, it's not necessarily the way I would layout the data. Each Storage Group has a volume for the DB and a volume for the Logs. Each Storage Group has 2 mailstores, and each mailstore has it's own LUN. 2 DB LUNs in the Storage Group volume, 1 Logs LUN in the Logs volume.
All volume initialised and schedule set to "- - - -"
When we configure SME to do a daily snap, with SnapMirror update and remote verification, the job simply fails.
Watching the filer, we see that the SnapMirror update occurs okay, and then SME clones the volume. But this is where things don't seem to work. Only one LUN gets brought online and mapped to the host. The whole job then stalls, and 10 minutes later the clone is destroyed and recloned with the same problem with the LUN. This repeats 3 or 4 times before finally failing.
I've recommended the customer upgrades to OnTap 7.2.6.1, SME 5, and SnapDrive 5.0.1 (maybe even 6.0.1) as they're running fairly older versions anyway, but it'd be good to know if this is a known issue with having multiple LUNs in a single Storage Group? According to the SME admin guide, this is indeed a valid config, and it actually uses this as an example config when it covers remote verification.
Anyone had much experience with full remote verification?
I think your problem is that the verification server is in the MS cluster. Cluster does not like mounting flexclones as shared resourse and it failes. We have the same setup but our verification server is not in the cluster.
Brendon
But this 100% works if you are just verifying on the primary to a passive cluster node.
We're not pointing at the cluster name (as that definitely doesn't work), but pointing to a physical node. I can't see technically why this wouldn't work? A cluster disk is technically no different to a dedicated disk (in 2003 anyway), so there should be no issue with it trying to bring it into the cluster.
At least, there shouldn't anyway?
I'm going to be doing the exact same thing at a client soon. They want to implement a 3-node (active/active/passive) Exchange cluster with SME. Was there anything that you implemented that is different because thie setup is a 3-node cluster? There's nothing specifically stated in the docs about setting SME up for an active/active/passive cluster.
Did you end up installing SME on both active nodes? The DR verification server - Is it one of the nodes in the cluster? I take it that you have each mailstore having it's own LUN within a volume so you can backup and restore each database individually - if required?
Lastly, when sizing the volumes did you use the old best practice of setting fractional reserve space equal to the size of the lun(s), or did you set fractional reserve space to some lower figure (i.e.30%) and set up fractional reserve monitoring via snapdrive?
I ask because it seems that people are starting to set fractional space reserve to 0% and using the volume auto grow and snapshot auto delete functions to take care of space concerns. I'd love to hear what people are doing when sizing their volumes.
Cheers
Hi Ian,
The issue I encountered was because the verification was done at the SnapMirror destination on a separate cluster.
If you are verifying using the passive node on the primary site, you should have no issues at all.
I wouldn't lay out the data like this, I have walked into this setup to try do the second phase. I see no real benefit in each mailstore having it's own LUN, other than possibly preventing both going offline if they run out of storage. As both LUNs have to be in the same volume, you can't put different retention policies on them, or recover them to different time periods (well, not without getting complicated with QTree's anyway).
Fractional Reservation was set to 100% as these are older versions of the software which do not support anything different. I'd definitely recommend reading up on these settings as there's a few caveats and you still need to use a little caution. But definitely need vol_autogrow set on. Within SME you can tell it to auto-delete snapshots, although IMHO this can be pretty disasterous, so I tend to stay away from that option.
For reference, there was a hotfix (931300) that needed re-installing on the DR Exchange cluster which then allowed the verification to work correctly. All working as it should have!