2025-05-26·2 min read

Building a Monitoring Stack with Grafana, Loki, and Prometheus (Part I)

The initial focus is on backup supervision

Monitoring

Introduction

As part of my learning journey in cybersecurity and systems administration, I began building a modern and scalable monitoring stack. This project allows me to explore powerful open-source tools like Grafana, Loki, and Prometheus while strengthening my skills in system administration, data visualization, and infrastructure design.

The initial goal is to monitor backup processes (Veeam), with plans to extend the stack to include other critical systems such as Active Directory.

Project Objectives

Centralize logs for quick access to critical events.
Monitor system and application metrics to anticipate failures.
Provide a technically solid and understandable foundation for other technicians or students.

Key Features

Centralized logging with Loki: Distributed architecture with separate read/write nodes, MinIO storage, and log collection via Promtail and a custom PowerShell script.
Metrics collection with Prometheus: Scrapes data exposed by a custom .prom file, providing insights into backup status (success/failure, time elapsed, frequency).
Custom Grafana dashboard: Built from scratch to visualize backup states in real time.
NGINX Load Balancer: Reverse proxy configured to route requests to the appropriate nodes based on their role (read or write).
Authentication and security: All entry points are secured to ensure data confidentiality.

Tools and Technologies

Grafana for dashboards and visualization
Loki for log management
Prometheus for metrics scraping
NGINX for reverse proxy and load balancing
MinIO (local) as S3-compatible storage for logs
Proxmox as the hypervisor for hosting virtual machines
Docker / VirtualBox used in the prototyping phase (ELK stack tested and discarded due to resource constraints)

Challenges & Learnings

Gained a deep understanding of the critical role of the Loki compactor in a distributed setup
Faced configuration challenges with NGINX, particularly with WebSockets and proper request routing
Evolved my architectural thinking to plan for scalability and resource optimization
Gained practical experience with high availability, network security, and modular system design

Current Results

Although the project is still in progress, the current stack already enables efficient backup monitoring. The next step is to implement email alerts in case of backup failures.

This setup provides a solid foundation for future use cases and further technical growth.

Next Steps

Integrate Active Directory as a new source of events to monitor
Implement email alerts for critical failures
Expand coverage to virtual machines and other key services
Explore additional components like Tempo or Alertmanager to move toward a full-featured monitoring platform

This project allows me to apply theoretical knowledge while building a practical, scalable, and secure monitoring solution. It serves as a solid first step toward mastering modern monitoring systems.