Addressing the challenges of SAN storage refresh

JoshM · ‎2023-11-20

Storage refresh projects can be one of the most challenging tasks at hand for enterprise storage teams, not only pulling the tablecloth out from under dozens or hundreds of workloads without disruption but re-inserting a new one too in the process. SAN storage presents an additional layer of complexity and risk with the management of host paths and interoperability. Meanwhile business consumers have grown used to the on-demand infrastructure of cloud service providers and view the entire process as needlessly disruptive, really putting the pressure on the storage team managing the refresh.

The first challenge is to ensure the replacement platform is fit for purpose. It’s a careful balancing act between provisioning enough performance and capacity such that the new platform can offer consistent SLAs and headroom for growth, but not so much that it stretches ever-constrained IT budgets.

NetApp’s SAN assessment program to the rescue: deploying Cloud Insights collects information on real workload performance and utilization – not just from Netapp devices but any major vendor – and suitable storage system can be designed based on workload needs and predicted growth.

But this is the first in a long string of challenges to come – ONTAP’s data mobility technologies help storage teams migrate the data with minimal disruption to hosted workloads, but how do you minimize operational disruption for your storage teams as well? Effective, heterogeneous observability is crucial to success.

Fortunately, Cloud Insights addresses this too – and because its already deployed as part of the SAN storage assessment process, your storage team can reap the benefits throughout the migration and on into the future, regardless of storage vendor selection.

So, what steps can storage teams take to minimize the pain?

1. Perform Housekeeping

We’ve all done it when moving house – unpacked boxes to only question why we even moved a bunch of old junk that we should have just thrown away to start with. Cloud Insights makes it easy to not make the same mistakes with your storage refresh.

With one dashboard to highlight potentially wasted capacity, storage teams can identify potential waste:

Volumes created, but unallocated or inaccessible
Volumes allocated to hosts, but with zero throughput over time
Capacity consumed by snapshots that may no longer be required
Multiple nested (and unnecessary) layers of capacity headroom

Of course, that doesn’t mean that these volumes can just be removed without question – but with Cloud Insights’ metadata model, teams can attribute this utilization to the applications and their owners, enabling them to have a discussion with the teams to identify whether the storage should be included in the migration or can be decommissioned entirely.

And performing this step prior to the design of the replacement platform may even impact just how much capacity needs to be provisioned, reducing the cost of the replacement platform in the process.

2. Assure Host Connectivity

Effective storage path failover is critical to maintaining availability throughout the storage migration, and unexpected outages during this process seriously impact the reputation of the storage team, even if they were ultimately caused by elements outside of the storage administrators’ control. And simple validation that all hosts have a pair of HBAs installed and connected to the fabric doesn’t necessarily mean that MPIO is properly configured end-to-end.

With a view of not only storage, but the SAN and attached hosts as well, Cloud Insights can validate this connectivity, and give storage administrators confidence prior to cutting over to new infrastructure. If there are any gaps, these will be identified and can be remediated prior to any migration activity taking place.

Of course, SAN path redundancy isn’t just for migration! Alerts can then be defined so that any SAN failures or configuration errors in the future are discovered and rectified before someone else finds out the hard way.

3. Plan the Migration

Planning the migration based on the storage layout, volume-by-volume, might seem straightforward but can lead to much greater disruption and frustration for the rest of the business, and as such, more pressure on the storage team. The business is migrating applications and workloads, not just data.

Use Cloud Insights annotations to organize storage into workloads. By tagging the hosted VM instances with their application and business unit, the automatic topology mapping will allow storage teams to easily identify and group volumes and LUNs by end-user. This can be performed manually for small environments, or for lager environments through rules defined based on naming conventions or integration with or exports from a CMDB, or a combination thereof, depending on the level of detail needed and where the information lives.

Cloud Insights has a straightforward and flexible API that allows all manner for information be brought in and associated with infrastructure or pulled out for use in other tools and reports. Find out more about the API in the getting started guide, or if you have an active Cloud Insights subscription, try it out yourself by accessing the swagger documentation on the API admin page in your tenant.

Grouping the migration tranches by business owner can mean fewer migration windows to arrange and less disruption for users, which also means less pressure on the storage team.

4. Operational Continuity

A new storage platform can often mean new tooling and terminology for operations teams to contend with, especially if this is a migration between storage vendors. Getting to grips with how to monitor, operate and manage the new platform in production is far from ideal!

Fortunately, as Cloud Insights is deployed prior to any migration activity during the assessment stage, operations teams can familiarize themselves with the tooling before the pressure of the migration is on. Because there’s a consistent data model across all vendors, any monitors, alerts, dashboards, reports and operational workflows for common tasks are maintained during and after migration. Whether the migration is from NetApp to NetApp, or between any other storage vendors.

Not only does this ease the pressure for operations during and after the migration, but it also allows a consistent comparison of service levels throughout – inevitably a storage refresh brings intangible complaints from end-users that the new storage “feels” wrong, and a if there’s a change in manageability tooling as part of the refresh it can be difficult to provide like-for-like comparison between the performance on the new and old platforms. With Cloud Insights storage teams will have the evidence to defend the performance the SLAs and SLOs offered, with a consistent comparison over time.

5. Track and Validate the Migration

Given potential impact of storage refresh projects, IT leadership usually wants to keep a close eye on progress. A storage-level report on which volumes and LUNs or how many TB of data have been migrated doesn’t often have a great deal of meaning, and it can often be on project management teams to translate these asset names into their business and application owners.

Fortunately, since Cloud Insights already understands the storage in terms of its application and LoB ownership from step 3 above, and is monitoring both the legacy and refresh platform, tracking, and reporting the migration in real-time is straightforward – no manual reporting required!

Last but certainly not least, ensure that an additional validation step is performed before the legacy platform is shut down. I hear you say “well, obviously”, however it wouldn’t be the first time, nor will it be the last, that shutting down a storage system that has “nothing running on it” caused a major outage, due to a workload that had been missed.

As a final step, monitoring metrics in Cloud Insights can ensure that there is zero I/O and throughput over time as a final validation before pulling the plug for good.

Summary

Effective, heterogeneous observability is critical to not only successfully completing storage migration projects with minimal disruption, but also to offer consistent service levels to end-users on the new platform.

NetApp’s SAN assessment program is the first step towards planning your next storage refresh, regardless of your current storage vendor. And by leveraging Cloud Insights’ heterogeneous visibility, automatic topology mapping and business context, storage teams can reduce risk and address the biggest challenges of SAN storage refresh.

And because it’s a SaaS observability service, with Cloud Insights the burden of managing your manageability tooling is no more, so teams can devote more time improving the platform, automating tasks and maintaining service levels.

If you have a storage platform approaching end of life, get in touch with your NetApp sales team.If you want to know more about Cloud Insights infrastructure observability, or additional use cases for Kubernetes platform manageability or ransomware protection, head here to find out more or request a trial.