Assuring resiliency in CI/CD Pipelines with Kubernetes Change Analysis

JoshM · ‎2023-11-27

Last week you read @ronnyf's post about the new Kubernetes workload health view in Cloud Insights, and how it gives platform engineering teams the ability to determine overall health of the platform and zoom in to risks and issues quickly. But what exactly happens when teams determine that a troubleshooting is warranted? Let’s take a closer look.

If you’re a platform engineer managing a shared Kubernetes cluster, you’re probably all too familiar with receiving complaints about poor performance from your counterparts in the applications teams, and of course, putting the blame on the infrastructure you’re managing.

For SRE teams, this can be a big time-sink – where the team has no choice but to waste hours checking through several different tools and trying to manually compare and correlate metrics, changes and logs to try to work out what’s going on.

All the right tooling is there to tell the story, but you’re the team that has to somehow piece it all together, and that takes time that you don’t have.

Infrastructure tooling can highlight any issues with saturation, throughput and latency, but this tooling only reports on infrastructure devices themselves, and doesn’t understand how these metrics correlate to actual workload behavior. Plus, the infrastructure is normally architected such that it should never be the bottleneck – effective capacity management should mean that the back-end devices are never running into over-saturation issues (Of course, NetApp Cloud Insights can help with that too!).

Since infrastructure doesn’t understand what’s running on top of it, teams must rely on architecture diagrams to identify workload dependencies, and hope that they’re up to date, too. Another delaying the identification of the root cause, meanwhile application performance is still degraded, the end-user experience is poor, and the platform engineering teams aren’t making any friends with the app teams relying on their infrastructure.

Cloud Insights provides a better solution: Kubernetes Change Analysis.

Cloud Insights tracks every configuration change in your environment and correlates them to performance alerts. This adds to Cloud Insights’ already extensive capabilities for platform engineering teams such as the Kubernetes Workload Map.

Cloud Insights workload map will show application dependencies, how data flows through systems, and the health and status of every workload – showing not only application issues themselves, but also clearly identifying the likely source of those issues.

Cloud Insights Kubernetes Change Analysis takes this capability to the next step by correlating configuration changes to clearly identify not only the root cause of issues, but identify steps to remediate, too.

And when the majority of performance issues in business applications are caused by factors external to the platform itself, this saves platform engineering teams hours of time troubleshooting that can be better spent on maintaining, automating and improving their infrastructure platforms.

But don’t just take my word for it! See how Cloud Insights can solve your teams’ Kubernetes troubleshooting challenges in this demonstration video.

(view in My Videos)

Find out more about how to get started with Cloud Insights for Kubernetes, or check to see what else is new.