Tech ONTAP Blogs
Tech ONTAP Blogs
There is one certainty in all enterprise IT infrastructures – migration. It doesn’t matter whose product you buy - you'll need to migrate data eventually. Sometimes it's because your current storage array has reached end-of-life. Other times, you'll find that a particular workload is in the wrong place. Sometimes it's about real estate. For example, you might need to power down a data center for maintenance and a critical workload would need to be relocated temporarily.
When this happens, you have lots of options. If you have an RPO=0 DR plan, you might be able to use your disaster recovery procedures to execute the migration. If it's just a tech refresh, you might use OS features to nondisruptively migrate data on a host-by-host basis. Logical volume managers can help you do that. With ONTAP storage systems, you might choose to swap the controllers and get yourself to a newer hardware platform. If you're moving datasets around geographically, ONTAP's SnapMirror is a convenient and highly scalable option for large-scale migration.
This post is about SVM Migrate, which is a feature that allows you to transparently migrate a complete storage environment from one array to another.
Before I show the data from my testing, I need to explain the Storage Virtual Machine, or SVM. This is one of the most underrated and underutilized features of ONTAP.
ONTAP multitenancy is a little like ESX. To do anything useful, you have to create a virtual machine. In the case of ONTAP, we call it an SVM. The SVM is essentially a logical storage array, including security policies, replication policies, LUNs, NFS shares, SMB shares, and so forth. It’s a self-contained storage object, much like a guest on an ESX server is a self-contained operating system. ONTAP isn’t really a hypervisor, of course, but the result is still multitenancy.
Most customers seem to create just a single SVM on their ONTAP cluster, and that usually makes sense to me. Most customers want to share out LUNs and files to varies clients and there is a single team in charge of the array. It's just a single array to them.
Sometimes, however, they're missing an opportunity. For example, they could have created two SVMs, one for production and one for development. This would allow them to safely give the developers more direct control over provisioning and management of their storage. They could have created a third SVM that contains sensitive file shares, and they could lock that SVM down to select users.
There’s no right or wrong answer, it depends on business needs. It's really about granularity of data management.
You can migrate an entire SVM nondisruptively. There are some restrictions, and you can read more here, but if you're running a vanilla NFS configuration with workloads such as VMware or Oracle databases it can be a great way to perform nondisruptive migration. As mentioned above, there are many reasons you might want to do that, including moving select storage environments to new hardware, rebalancing workloads as performance needs evolve, or even shifting work around in an emergency situation.
The key difference between SVM Migrate and other options is that you are essentially migrating a storage array from one hardware platform to another. As mentioned above, an SVM is a complete logical storage array unto itself. Migrating an SVM means migrating all the storage, snapshots, security policies, logins, IP addresses, and other aspects of configuration from one hardware platform to another. It’s also designed to be used on a running system.
I’ll explain some of the internals below. It’s easier to understand if you look at the graph.
I usually work with complicated application environments, so to test SVM Migrate I picked the touchiest configuration I could think of – Oracle RAC. I built an Oracle RAC cluster using version 21c for both the Grid and Database software.
A test with a database that is just sitting there inert proves nothing, so I added a load generator. I normally use Oracle SLOB, available here. It’s an incredibly powerful tool, and the main value is that it’s a real Oracle database doing real Oracle IO. It’s not synthetic like vdbench. It’s the real thing, and you can measure IO and response times at the database layer. Anything I do in a migration test would be affecting a real database and associated real timeouts and error handling.
My main interest was in the effect on cutover. At some point, the storage personality (the SVM) is going to have to cease operating on the old hardware platform and start operating on the new platform. That’s the point where IP addresses will be relocated and the location of IO processing will change.
What will cutover look like? After multiple migrations back and forth within my lab setup, I decided to graph it.
Here’s what it looks like:
Here’s what was happening:
That’s it. It just works, and all it took was a single command.
Cluster1::> vserver migrate start -vserver jfs_svmmigrate -source-cluster rtp-a700s-c01
Info: To check the status of the migrate operation use the "vserver migrateshow" command.
Cluster1::>
I’m impressed. I know ONTAP internals well enough to have predicted how this would work, and SVM Migrate really isn’t doing anything new. It’s orchestrating basic ONTAP capabilities, but whoever put all this together did a great job. I was able to monitor all the steps as they proceeded, I didn’t note any unexplained or problematic pauses, and the cutover should be almost undetectable to database users.
I wouldn’t have hesitated to use SVM Migrate in the middle of the workday if I was still in my prior job. If the DBAs were really looking, they might have noticed a short and minor impact on performance, but as a practical matter this was a nondisruptive operation.
There’s more to the command “vserver migrate” than I showed here too. For example, you might have a lot of data to move and you want to set up the initial copying by defer the cutover until later. You can read about it in the documentation.