Storage engineers, as the experts for all things TB and IOPS, know their infrastructure inside out and have a variety of tools that let them deep dive and really understand what their systems are up to. Yet the most all of them also cite the time it takes to troubleshoot as one of their key challenges.
It’s because of complexity. But this complexity is about more than just the technology side of things. A variety of platforms and technologies from different vendors adds to complexity. A lack of architectural consistency due to changes in best practices over time, can of course cause headaches too. But, these are generally the same teams that built the infrastructure and maintain it every day, so they understand the relationships between platforms and know where to look first for potential bottlenecks and issues.
The complexity that gets in the way of troubleshooting is the organizational complexity.
Storage teams manage shared resources used by hundreds or even thousands of individual applications, the biggest challenge for teams is understanding who is using capacity and performance, how they’re using it and if they’re using it appropriately.
Most storage teams have daily tasks to ensure their infrastructure as a whole is running smoothly, and reports of problem usually come in the form of complaints from applications teams using the storage.
The application teams don’t understand the underlying infrastructure to the same extent of the infrastructure teams, and why should they? They understand what level of capacity and performance they expect of the storage, and know when the signals from their application monitoring tooling is telling them something is underperforming. That’s when they raise a ticket, send an email or make a call to the infrastructure team with one simple request – fix it!
For the engineers in the storage team on the receiving end, they have one not-so-simple question… fix what? The application administrator almost hasn’t specified a volume ID or a storage name, just that the storage seems to be running slowly and impacting application performance. This means that the biggest delay in the troubleshooting process is for the storage admin just to figure out what and where exactly they’re supposed to be troubleshooting. This is made ever more difficult when the infrastructure tools are reporting that everything is generally performing as expected.
This is where Cloud Insights comes in, with integration with common CMDB and ITSM tools such as ServiceNow.
Cloud Insights understands the infrastructure, ServiceNow understands the business, and together they’re more than the sum of their parts.
So, say the service manager calls the infrastructure team and reports that the “ECOM” platform is running slowly. Instead of the storage administrator making guesses to what they might be talking about based on server or storage names, or going away and checking documentation or previous support requests to figure it out, they can perform a simple search in the Cloud Insights UI. Since the business context in Cloud Insights can also extend to who and what is using resources, inappropriate use of infrastructure can be more easily policed, too.
See how Cloud Insights is helping organizations claim back wasted storage and infrastructure troubleshooting time in this short demo.
Of course, Cloud Insights is more than just storage and infrastructure. If Kubernetes is more your speed we can save you time and toil there too, find out how.