Why you would run an monitoring system like this? I would mostly say because we can but no, this time it is necessary to run such an stack. Since I run autonomous systems and an IT-company and Emile mostly run this as an companion in most of this non-customer based projects. After two years of maintaining a shitload of Icinga2 and check_mk based systems, I decided to migrate the whole monitoring to a new shiny system. After 2 weeks of evaluation i tried the setup we describe in the blogpost and can recommend the setup!
Monitoring in a nutshell: Have a master to which the workers report their status. Scalabale, simple, efficient: the ETVGA stack (Exporter Telegraf Victoriametrics Grafana Alertmanager).
The individual nodes report their stats to the master. This makes it possible to dynamically add nodes without needing to adjust stuff on the master node.
In the example schema above, the worker nodes
node[1-n].company.com report their stats to the master node located at
All files needed are located in this git repository.
The setup works like this: The Ansible inventory is built using the data provided by the netbox. This is then used by the Ansible runner to create the exporter and sidecar Telegraf service for exporting the data on the individual nodes.
Setup the main node
Install docker + docker-compose
- install docker
$ curl -s https://get.docker.com | sh
- install docker-compose
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.25.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
Setup the directory structure
- create a docker directory
$ mkdir -p /docker/monitoring
$ cd /docker/monitoring
Insert the needed files
- insert docker-compose here
- insert the grafana.env here
Adjust the docker-compose to suite your needs
- adjust host rules (replace "yourdomain.com" with your domain)
sed -i 's/yourdomain.com/newdomain.com/g' docker-compose.yml
- create passwords using htpasswd
- create passwords the auth (traefik, victoria-metrics)
- create a password for grafana in the grafana.env
Deploy the compose
docker-compose up -d
- login to grafana using the user admin and the password defined in the env file
- add the victoria metrics endpoint
Setup the worker nodes
We use ansible to deploy Telegraf and the Exporter onto the devices.
- Add influx repo
- Add influx repo gpg key
- Update apt cache
- Install Telegraf
- Build config from template
- Restart telegraf
- add the host to the ansible inventory
This is done like this
- adjust the telegraf config file
- run the ansible playbook
ansible playbook -i <inventory> Playboks/setup-telegraf.yml --limit "<ip>"
These are worker nodes that aggergate metrics that should be monitored. This happens in two steps:
- Aggergate the metrics using an Exporter (such as bird_exporter)
- Scrape the exported data on the node using Telegraf. This periodically collects the results from the exporter and pushes the data to the Victoria Metrics instance on the master node.
The alerting is done by the Grafana, i want to attach the alertmanager by Prometheus, but it is currently not support by Victoria Metrics