luns of TESTW_SAPDBVOL01 presented to iGroup of TESTW DB Server
vol clone split start TESTW_SAPDBVOL01
We currently have 2 DBs (total VOL Sizes is arounf 6 TB) moved to this pair of NetApps and 10 more are scheduled. Eventually there will be over 30TB on PROD that will need to be cloned.
Currently the 6TB of VOlumes (3 TB each on each controller) takes 2 days to finish.
If I do 2 VOL CLONE SPLITS operations on each controller at the same time on the current DBs, completion takes 4 days!
The solution works beautifully but my problem is as more and more Databases are added to this NetApp Solution, the VOL CLONE SPLIT process takes a very long time to complete that if I have 2 VOL CLONE SPLIT process running at the same time on each controller, the time it takes to complete increases tremendously . We have to VOL CLONE SPLIT so we can delete/remove the reference snapshot from PROD so our snapshot reserves there will not fill up. We cannot also delete the reference snapshot of the clones on the PROD Netapp as doing so will mean snapmirror updates will fail since the reference snapshot still exists on the BACKUP/NON-PROD NetApp. And the reference snapshot on the Non-Prod NetApp cannot be deleted until my VOL CLONE SPLIT Process is done.
Our PROD NetApp is SSD so we can't have a very large snapshot reserve allocated to capture all the snapshot deltas and allow for continued snapmirror updates whilst the vol clone split processes go on on the BACKUP/Non-Prod NetApp.
So my question is: Is there a way to have the VOL CLONE SPLIT process done faster?
Could I use VOL COPY or NDMPCOPY Instead? (I can probably ask the client that their DBs will now need to be down until a VOL COPY / NDMPCOPY (to a different aggregate???) is done. They will loose the luxury of have their DB Clones immediately available. We're also planning to have a 10GBe SnapMirror Pipe between the PROD and BACKUP/Non-PROD NetApps and just use SnapMirror to do the above -- but then again, the CLONES will no longer be instantly mountable... There will now have to be a wait until after a SnapMirror Update/Break is done or VOL COPY of the SNapMirror Volume to an independent FlexVol likely to a different Aggregate...
vol copy and ndmpcopy will be slower. Split will still be fastest.
Based upon the above it sounds like Backup\NonProd is all SATA and we are likely spindle bound. I would suggest gathering some perf data during your split operations. If you just want to start small, then start a logged ssh session in advanced mode during a split operation and gather:
sysstat -x 1 (for 15 minutes)
From there I assume we'll see that there just aren't enough spindles to handle the request, but if not then we can start to narrow it down a little and move onto perstats with NGS assistance.
The 10GE snapmirror upgrade will be nice in that it will allow you to start the split process much sooner than previous, but until the split it complete you will still be limited and not able to remove the source snapshot.
I made the mistake of putting my personal information out here once, so send me a private message and I'll send my personal contact data if you would like.
We were told by our Post-Sales VOL CLONE SPLIT was meant to be slow and gingerly do its time to become a FlexVol so the mounted LUNS under the volume will continue to perform optimally as if there's nothing going on on the background. I have already tried prioritisation of the VOL but no dice. It is really slow. The LArger the VOL and the more concurrent VOL CLone SPlits one have -- the slower the time to complete. So I thought vol copy/ndmcopy or even doing fresh snapmirrors/breaks would be a whole lot faster (with sacrifices of course -- meaning the clones will not be instanly usable)
Well vol copy might be faster, but I'd have to try it out first. ncmpcopy will not be faster. The reasons for my statements earlier was just what you pointed out again. The more splits you have going on the slower it is, leading me to think there is a spindle limit there.
But even with just 1 VOL CLONE SET being split -- it still takes a fair amount of time that doing a fresh snapmirror and breaking that relatinship will still be considerably faster than a vol clone split.
I have not done this but do you think it is possible to vol copy the snapmirror volume to another aggregate?
Can you recommend other solutions/possibilities? It is a dfeinite now that we're getting a 10GBE link between the 2 3270s.
One of the options to our Weekly Clone requirements would be to no longer do a vol clone split but to have those DB copies remain as flexvols (and just deal with the snapshot reserves on the production NetApp). The other DB Copies which will not be re-cloned/refreshed for a month can either be done via vol copy as proposed (if it works) OR snapmirror/break using the 10GbE pipe -- "and just accept the fact" there will now be WAIT times for some of the DB Clones to be available...
So it is possible to perform a vol copy off of a snapmirror destination (once the init finishes). I would suggest that you base your vol copy upon a snapshot that is present in the destination snapmirror vol since you are using LUNs. You can use vol copy between aggrs or controllers (though I would suggest intra controller for speed) if you need to as well.
The Snapmirror init \ break cycle over 10GE is a solution, but instead of having a single source and destination for each database in mind is to have a one to many by where DB1 is mirrored twice. Once for your actual DR relationship and once for your clone process. The DR relationship would remain in tact at all times so that in case of actual disaster you don't impact the RTO/RPO requirements. What will also work is if you setup a cascade and then break\resync your cascade. So DB1 on prod would mirror to DB1_DR on backup and then another mirror would go from DB1_DR on backup to DB1_clone on backup. The DB1_DR would always remain in tact and the DB1_clone would be the one broken off all the time. Both of these solutions will mean that the clones will not be available until the init or resync completes as you already understand. What I'm worried about for both of these solutions is the scalability of the solution as you add more 3TB DBs. Since 3TB is much more than 10GE, I might suggest that if you go down this road, go with the cascade solution as it will likely be faster and reduce the impact on your 10GE link.
As you stated for the weekly clones, if you can absorb the change in the snap reserve\frac reserve on the prod system, that would offer you the greatest storage efficiency and speed, otherwise I think you'll have to go back to the snapmirror or vol copy scenario.
I have a customer with the same problem, 3TB SAP ERP on a FAS3240AE MetroCluster. Vol clone split takes almost a week. We tried priv set diag; wafl scan speed 1000" but it wont go over a few megs per second.
Is there any hidden setting to speed up the process? We tried a filer internal vol copy/snapmirror which gave us way better speeds.
Sure, not a bad idea to do so, but like with snapmirror, we have a throttle option and for usual wafl scanners we can increase wafl scan speed, but seems we can do nothing for vol clone split. Imagine a volume clone split restore after a restore due to inconsistent database "hey, everything is fine now, it will just take a week to have everything back to 100% normal ..."