Tech ONTAP Blogs

Self-Managing Storage: Part 4 Understanding Capacity Lifecycle Management


Welcome to part 4 of the “Self Managing Storage” blog series. In this blog post, I will discuss how NetApp Active IQ Unified Manager simplifies capacity management through lifecycle management.


Capacity Lifecycle Management

Active IQ Unified Manager has been widely used for monitoring the capacity of volumes and aggregates on ONTAP clusters. It notifies storage admins when set thresholds have been met or exceeded, and it is then the job of storage or IT administrators to resolve issues reactively.

Although reactive capacity management has been around for a while, in Active IQ Unified Manager 9.8, we have introduced preventive and proactive management. The main goal of Capacity Lifecycle Management is to make sure that your storage resources never run out of space. To achieve this goal, Capacity Lifecycle Management is performed in three modes, preventive management, proactive management, and reactive management. These three modes use different activities to make sure that hot data is available locally. Cold data is tiered to the cloud and enough storage space is made available locally for resources to grow in response to demand.


Figure 1:  Capacity Lifecycle Management


Preventive Management

The goal of preventive management is to make sure that that you never run into a situation where the workloads run out of space on your ONTAP cluster. To make sure that this never happens, preventive measures like balancing capacity across the nodes and tiering of cold data to the cloud are facilitated through Active IQ Unified Manager.

The main feature of preventive data capacity management is that it alerts you before problems arise. Active IQ Unified Manager uses deep analytics to check two months of performance and health metrics to determine the best course of action to maintain the best utilization of your storage resources.

Preventive Management Action 1

Now let us look at the first preventive action that Active IQ Unified Manager takes by tiering your cold data to your private or public cloud. Cloud tiering is an industry-standard practice for handling information life cycle management. Previously, companies used to take their cold data and move it to tape backup. They would then move it to off-site locations for disaster recovery. Cloud tiering is now the preferred way of storing cold data, and cloud solutions can be both private and public.

Active IQ Unified Manager provides deep analytics to help customers understand which resources would benefit from moving to cloud storage. To make it easier for the customers to move resources to the cloud and to help customers increase the utilization of their local storage, Active IQ Unified Manager has added two new management actions to Active IQ Unified Manager 9.8. These two management actions are backed by both performance and capacity analytics residing in both ONTAP and Unified Manager.

The first analysis looks at volumes on a cloud tier and determines whether their settings are correct. If it is determined that volumes could benefit from different tiering settings, then an action is created to configure the volume for the correct setting, thus helping to increase local tier utilization.

Figure 1 shows how this new management action will appear on your dashboard.



Figure 2 : Storage Tier Policy Mismatch


The second analysis involves volumes that are not residing on a cloud tier. It determines whether any volume that has a significant amount of cold data should be moved to the cloud. This analysis not only looks at the amount of cold data but also several other key attributes such as I/O trends, the physical configuration of the volumes, and so on.

After a volume is found to be an ideal candidate, Active IQ Unified Manager then uses the ONTAP Volume Move engine to determine which cloud tier is the best fit based on the volume’s performance and capacity footprint. If a suitable fit is found, then an action is created that allows the customer in one click to set the volumes to a correct tiering policy and move the volume to the cloud using ONTAP no-downtime Volume-Move technology.

Figure 2 contains an example of a storage tier mismatch and the action that Unified Manager takes to help you take full advantage of your cloud tier (Figure 3).



Figure 3: Storage Tier Mismatch Detected


Cloud Tiering Debugging

The Active IQ Unified Manager team understands that enabling actions on your cluster requires a certain level of trust. To build this trust and to help diagnose cloud activity, some new features were added to the Workload Analyzer in Active IQ Unified Manager 9.8. Active IQ Unified Manager is known for its deep analytical and performance monitoring across ONTAP clusters. Unified Manager 9.8 collects cloud tier statistics that allow the customer to understand throughput and capacity metrics of the workloads from the Workload Analyzer page. Using the Cloud Throughput view under the Throughput panel, you can see how much data is stored and transferred from and to the volumes in the cloud (Figure 4)



Figure 4 : Cloud Throughput activity


In the cloud tiering architecture, hot data is stored in the local tier and the cold data is stored in the cloud tier. Active IQ Unified Manager has added the capability to see how much data is stored in the local tier and the cloud using the Cloud Tier View under the capacity trend. It also shows how the data is trending for these workloads that are in the cloud tier to understand if the tiering policy should be adjusted (Figure 5).



Figure 5: Cloud Tier views


Preventive Management Action 2

The second preventive action is about balancing the capacity across the cluster. You must make sure that your workloads never run out of capacity, otherwise, you will receive that dreaded 2 AM call. Every 24 hours, Active IQ Unified Manager 9.8 analyses the clusters across your data center to determine if they are balanced to guarantee that each workload does not run out of performance and capacity space.

Active IQ Unified Manager looks at all the storage pools and checks their long-term metrics to determine their trends. If any storage pool’s use capacity is over 70% or imbalanced, it analyses the situation and determines which action would place your cluster back in balance. While determining where the volume should be moved, it also makes sure that it does not over-provision in terms of performance on another storage pool or move to a storage pool that would create another capacity imbalance (Figure 6).



Figure 6 : Capacity Imbalance Action


Proactive Management

Next up is proactive capacity management. For preventive management, the trigger point is at the 70% utilization threshold while the proactive threshold is at 85% to 100%. When the capacity of volumes and/or aggregates reach the 85% to 100% threshold mark, the proactive management of capacity kicks in.

Active IQ Unified Manager has introduced management actions that help you address volume out-of-space issues. The volume-out-of-space issue is one of the most common issues faced by customers. There are multiple ways to fix a volume out-of-space event. Most of the time, customers choose to resolve this issue by triggering the volume-auto-grow option. This solution allows the volume to automatically grow if there is capacity space available. When this scenario has been detected, Active IQ Unified Manager creates an action that shows up on your dashboard, thus making it easier for you to address this particular issue prior to downtime or the volume-out-of-space issue happens (Figure 7).



Figure 7 : Volume Space Full Action


There is More!


We hope that this has given you an overall understanding of Capacity Lifecycle Management and has enticed you to update and try Active IQ Unified Manager 9.8.


For an in-depth understanding of performance, capacity, and security lifecycle management, look for more of this blog series.

  • Self-Managing Storage: Part 1 – Understanding Active IQ Unified Manager LifeCycle Management
  • Self-Managing Storage: Part 2 – Understanding Storage Resource Performance LifeCycle Management
  • Self-Managing Storage: Part 3 – Understanding Workload Performance LifeCycle Management
  • Self-Managing Storage: Part 4 – Understanding Capacity LifeCycle Management
  • Self-Managing Storage: Part 5 – Understanding Security Manager LifeCycle Management


We know that you may have questions because we could not cover this entire topic here, so please connect with us, and we would be happy to answer any of your questions.