Cloud Insights has the ability to ingest any metric, from any device, on any platform using open-source collectors like Telegraf. Having a tool that can collect metrics from thousands of different devices is a key part of solving the observability challenge in modern environments.
Cloud Insights uses a suite of powerful visualization tools to display metrics from across your environment. You can collect data from storage, virtualization and cloud resources without the use of agents, where Cloud Insights uses native APIs. However, to collect granular metrics directly from hosts and their applications, you can also ingest metrics from open agents such as Telegraf.
Cloud Insights makes it easy to get started with visualization and alerting on agent data metrics, even at extreme scale, but to help with the logistics of large-scale agent deployments, Ansible can help.
In this post, I’ll cover how to deploy Cloud Insights collector agents en-masse using Ansible. If you’re not familiar with Cloud Insights already, you should sign up for a free trial to check it out in your own environment. You can refer to this article to help you get to grips with the basics. I’d recommended that you first install at least one Cloud Insights Collector manually just to see how it works. Previous experience with Ansible, though not strictly required, will also help.
After the deployment, you’ll have the ability to quickly create visualizations on dashboards like the examples below, showing CPU and memory usage across groups of hosts, applications or services.
How does it work?
In a nutshell, the process consists of the following three steps:
Extract the Telegraf Configuration file for a specific collector, in this example we will use the CentOS agent
Prepare and modify the configuration for Ansible
Deploy the configuration along with Telegraf through Ansible to the target hosts
Preparing the Telegraf configuration file
The goal here is to prepare a Telegraf configuration file that we can use with Ansible, to insert our own data dynamically based on the host we are deploying it to. If you have already deployed a Cloud Insights agent manually, you could copy the Telegraf configuration file from that machine to save a few steps.
The most commonly monitored hosts and applications are represented with collector tiles in Cloud Insights, where the install and configuration strings are included for convenience. You can of however use any Telegraf input plugin alongside the Cloud Insights output – you just need your Cloud Insights tenant ID and the appropriate integration token represented in the output section of your telegraf.conf. Be sure to check out Insight 2020 breakout SPD-1327-3 for more detail on DIY agents.
Log into Cloud Insights
Browse to Admin > Data Collectors
Click the Data Collectors button
Select RHEL & CentOS from the list (Use the filter input to search)
Click the button to copy the agent installer snippet to your clipboard
Paste into a text editor. It should look something like this:
Our inventory is shown by “hosts” which is set to localhost only for this example
We are enforcing the Telegraf role “sbaerlocher.telegraf”
A task is used to copy the Telegraf configuration file to the default location
Then we have 3 tasks to replace variables that the installer script would have replaced with host specific value
To deploy it, we use the command:
Using this method, you can deploy and maintain the Cloud Insights Telegraf agent to any number of systems easily. You can also adapt this process to perform the same bulk deployment and management of agents for any number of Cloud Insights collectors such as Elasticsearch, Cassandra, Redis, nginx and so on. In addition to this, you can use Ansible to enforce this configuration, in case one of the agents is removed.