Tech ONTAP Blogs

How to improve Dell PowerMax operational resiliency with NetApp Data Infrastructure Insights

Miles_Kniep
NetApp
39 Views

Wondering how a NetApp service like Data Infrastructure Insights can enhance the performance of your Dell PowerMax estate? You're not alone. Data Infrastructure Insights is a versatile IT observability platform explicitly designed for heterogeneous storage landscapes, including your Dell estate. Whether you’re managing a handful or a multitude of PowerMax systems in your estate, this article is tailored to your needs.

 

The Problem

 

As a storage admin, supporting business-critical applications can be challenging. This is especially true considering that the users you manage storage for change their demands unexpectedly, putting new pressure on your infrastructure and your team. You can deploy high-performance components to meet their needs, but those needs inevitably expand and change their workload profiles over time.

 

We’ve all been there at some point – stuck on a call late at night, handholding a SEV-1 performance degradation through to some semblance of stability, all because someone changed their requirements and didn’t tell the infrastructure teams what was about to happen. It’s not an easy existence on days (and nights) like that, and it has the unfortunate knack for sticking around until new kit arrives to handle the change in demand or until the offending business application gets demoted in priority to save the rest of the population. But what if you could break free from this cycle of constant change and uncertainty?

 

To get out of that cycle, developing a proactive strategy of adapting to and managing demands like this becomes paramount to ensuring your reliability in achieving your ideal future state (and avoiding that next overnight SEV-1). NetApp Data Infrastructure Insights has a robust toolset that helps you achieve this – enabling greater flexibility, cutting resolution times to unexpected incidents, and helping manage the amount of noise and toil the StorageOps team must grapple with daily.

 

Because Data Infrastructure Insights is a platform provided by the developer of the world’s best Storage OS, one might think this is limited in scope to NetApp ONTAP, but this couldn’t be further from the truth.  We realize that storage environments can be exceptionally diverse. Generations of investment carry their momentum into future technology choices, leaving teams splitting their time or over-specializing to meet demand. But…

 

  • What if you didn’t have to go so deep each time, and you could make decisions on provisioning and workload rebalancing more easily, with a sense of confidence in your choices?
  • What if you could understand and plan your entire estate from a single, robust playbook instead of constantly hopping between Unisphere instances (let alone the other device managers scattered throughout the estate)? Imagine the relief from this constant back and forth.
  • What if that playbook could speak the common tongue of ‘Storage’ while also understanding the regional dialects of ONTAP, PowerMaxOS, and others?

 

As highlighted in GigaOm's CxO Decision Brief, NetApp’s Data Infrastructure Insights is a game-changer for your team’s efficiency. It simplifies the complex world of data storage technologies, allowing you to communicate in the most suitable system dialect. It’s designed to manage a diverse estate, so when you choose to manage your Dell PowerMax arrays with Data Infrastructure Insights, you unlock a comprehensive toolkit streamlines operations, enhances monitoring, and optimizes performance, making your job easier and more efficient.

 

Understanding the Language: StorageOps

 

Data Infrastructure Insights was built to provide unified observability that helps operations teams ensure their storage infrastructure's performance, availability, and security. It supports a diverse array of storage and computing platforms, connecting the dots between them and correlating behaviors so you don’t have to. A key design principle is that it brings those diverse assets into a ‘common model,’ so a byte is a byte (all converted to friendly base 2, even when that one system from that one vendor always has reported in base 10). This means less time spent on spreadsheet math for your capacity plan and more time getting things done. Other vendor-specific terms get normalized too; for example, Instances on AWS are represented alongside all other Virtual Machines – ONTAP Aggregates and PowerMax SRPs are globally searchable with all other Storage Pools, and so on – giving you a unified view into all that is, across your data centers. As infrastructure complexity and storage workload demand grow, Data Infrastructure Insights leverages machine intelligence to understand the storage duty cycle, helping with planning, issue resolution, and overall reducing operational toil. Let’s explore how this helps with day-to-day operations in more detail.

 

Intelligent Capacity Planning

 

Data Infrastructure Insights leverages machine learning to understand the typical duty cycle of your storage landscape, enabling reliable forecasting based on demand. This allows you to plan reclamation tasks on time and to avoid surprises when it’s time to expand capacity. In the example below, a few SRPs (Storage Resource Pools on PowerMax) are getting close to running out of capacity based on the observed growth rates of their hosted workloads. You can see how the forecast adjusts after we started some cleanup activities around January 17th, helping buy time before purchasing more hardware.

 SRP Time to Full by Array - Last 30 Days.png

 

Noise-Cancelling Alert Management

 

Effective alerting is essential for smooth operations, but in large and busy environments, alerting designed with the best intentions can quickly snowball into one big storm of noise. Nonetheless, it’s just as crucial to know about the next big issue as it is to minimize alerting noise so your team knows what could help avoid that next SEV-1 as quickly as possible.

 

Data Infrastructure Insights provides robust monitor management tools that help manage the noise generated by large environments with multiple management planes. The Monitor considers both the metric in breach and the duration in that state, minimizing noisy alerting from brief but often inconsequential spikes in activity.

monitor_conditions.png

 

 

Once an Alert has been broadcast, Data Infrastructure Insights provides one more layer of security to help keep the important in focus. Alert resolution criteria can be defined to consider the duration that a resource has returned to a healthy state, reducing alert storms from resources that tend to fluctuate between acceptable and at-risk states. This gives your team a more flexible window to respond appropriately and with the proper context.

monitor_resolutioncriteria.png

 

Speaking the Dialect: PowerMax Operations

 

Data normalization is a powerful capability, but there is still occasionally a need to examine specific system details more deeply. Data Infrastructure Insights facilitates this for two key reasons.

 

  • Any new user who doesn’t know Data Infrastructure Insights yet may only speak their platform’s dialect and not be versed in the broader world of StorageOps. They may think in Aggregates rather than Storage Pools. It helps, then, to have visibility into familiar terms and be able to solve more immediate tactical challenges before elevating to a more strategic view.
  • It enables more nuanced responsiveness to challenges likely to occur on a specific platform and lends flexibility where a holistic view is not enough.

 

To that end, when a new storage system is configured to be monitored by Data Infrastructure Insights, you are presented with a set of dashboards from the Gallery, aligned to recommended practices tailored for the infrastructure in focus. Additionally, example monitors relevant to that platform are revealed, providing a baseline configuration for users to build.

 

For Dell PowerMax, let’s examine a few cases where we can make life easier.

 

Global Array Summary

 

The Array Overview dashboard consolidates telemetry from every monitored PowerMax you’ve pulled into Data Infrastructure Insights. This could be two arrays or 200—the scalability is not limited. The first section provides high-level information about your arrays, common topics regarding their configuration, like OS version and failed capacity, and an overall performance summary.

powermax_arraysummary.png

 

 

 

srpcapacity_anonymized.png

Moving down the dashboard, you will see fundamental questions about the health of your SRPs (Storage Resource Pools) by capacity and performance. Notably, this provides easy-to-understand traffic light conditional formatting for at-risk items while incorporating the capacity growth forecasting mentioned above.

 

 

 

 

 

 

 

 

powermax_servicelevels.png

Lastly, for this view, it’s essential to understand your overall ability to achieve defined service levels. This is automatically rolled up across all Volumes by Service Level and color-coded to the PowerMax native values. Keep in mind that dashboards are entirely customizable. So, while in this case, we included the Silver Service Level, if we never intend to use that on our arrays, we can save some space by removing that widget tile or repurposing that space for something else of interest.

 

 

 

 

While the Array Summary dashboard provides a global view, we can also easily filter the dashboard down to single arrays on the fly using configurable dashboard Variables – thereby filtering the tables and Service Level summaries to that array in a simple click.

 

Director Performance Analysis

 

It’s also important to understand when and how the Array will encounter potential performance bottlenecks. The Director Performance Analysis dashboard helps with this by providing an Array-level performance summarization that can be drilled down to individual nodes. It also flags potential risks outside of standard utilization.

 

Do you see a low cache hit ratio? That will likely impact your data reduction ratio down the line or cause performance degradation due to overly high cache utilization, and it is worth investigating now instead of later.

 

Dashboards in Data Infrastructure Insights also enable you to leverage custom expressions to get more valuable answers faster. A good example is the calculated Headroom columns in the table below – which compares the observed Max Utilization vs observed IOPS and Throughput. It then measures what is left in performance headroom based on a user-defined Utilization Target that can be updated on the fly. This allows you to ask a question and get a reliable, grounded answer in data without wasting more time in spreadsheets. How much workload could I feasibly add if I want my PowerMax Directors not to exceed 49% utilized? What if I want to know about 47% utilized? And 38%? It’s only a snap to find out.

anonymized_director_analysis.png

 

 

Moving further into the dashboard, it reveals the duty cycle of the directors in a given array, helping you spot anomalies and where there might be an imbalance in workload demand. Notably, this view also merges in port metrics from the attached fabric switches (Cisco or Brocade), saving one extra step in validating how things are running.

 director_workloadcycle.png

 

 

When we identify a group of directors under stress, the final section shows us individual Storage Groups (a group of Volumes scoped to a particular application/host) and their respective Service Levels.

anonymized_groups_servicelevels.png

This gives us a convenient jump point to analyze specific host behavior, to check what other resources the compute is accessing, and what the current state of pathing over the fabric is looking like in the SAN Analyzer – giving me context into what possibilities might exist to consider rebalancing the placement of this hosts workloads that could free up headroom on my saturated directors.

 

Conclusion

 

Every storage environment is unique, and users need flexibility in how they consume and react to telemetry from their systems.

 

Data Infrastructure Insights facilitates this by enabling users to tailor and share their content to suit their needs. While the Gallery Dashboards and built-in Monitors are a solid starting point to help you manage Dell PowerMax, the opportunities to enhance this visibility further are boundless. This flexibility, paired with a hefty dose of machine intelligence, ensures that the monitoring and alerting flow are perfectly aligned to support tactical challenges where speaking the dialect of a given system is most important while also handling the broader language needed to enable strategic decision-making.

 

There is always more capability coming into the platform, too—check out the What’s New doc pages for Data Infrastructure Insights to see the latest enhancements. If you want to know more about Data Infrastructure Insights, head to the homepage here. Still interested to know more about what else you can do with your PowerMax arrays on Data Infrastructure Insights? Check back next week and we’ll take a dive into Service Level attainment and improving the end-user experience.

Public