cd ../projects
2025-05-26·2 min read

Building a Monitoring Stack with Grafana, Loki, and Prometheus (Part I)

The initial focus is on backup supervision

Monitoring

Introduction

As part of my learning journey in cybersecurity and systems administration, I began building a modern and scalable monitoring stack. This project allows me to explore powerful open-source tools like Grafana, Loki, and Prometheus while strengthening my skills in system administration, data visualization, and infrastructure design.

The initial goal is to monitor backup processes (Veeam), with plans to extend the stack to include other critical systems such as Active Directory.

Project Objectives

  • Centralize logs for quick access to critical events.
  • Monitor system and application metrics to anticipate failures.
  • Provide a technically solid and understandable foundation for other technicians or students.

Key Features

  • Centralized logging with Loki: Distributed architecture with separate read/write nodes, MinIO storage, and log collection via Promtail and a custom PowerShell script.
  • Metrics collection with Prometheus: Scrapes data exposed by a custom .prom file, providing insights into backup status (success/failure, time elapsed, frequency).
  • Custom Grafana dashboard: Built from scratch to visualize backup states in real time.
  • NGINX Load Balancer: Reverse proxy configured to route requests to the appropriate nodes based on their role (read or write).
  • Authentication and security: All entry points are secured to ensure data confidentiality.

Tools and Technologies

  • Grafana for dashboards and visualization
  • Loki for log management
  • Prometheus for metrics scraping
  • NGINX for reverse proxy and load balancing
  • MinIO (local) as S3-compatible storage for logs
  • Proxmox as the hypervisor for hosting virtual machines
  • Docker / VirtualBox used in the prototyping phase (ELK stack tested and discarded due to resource constraints)

Challenges & Learnings

  • Gained a deep understanding of the critical role of the Loki compactor in a distributed setup
  • Faced configuration challenges with NGINX, particularly with WebSockets and proper request routing
  • Evolved my architectural thinking to plan for scalability and resource optimization
  • Gained practical experience with high availability, network security, and modular system design

Current Results

Although the project is still in progress, the current stack already enables efficient backup monitoring. The next step is to implement email alerts in case of backup failures.

This setup provides a solid foundation for future use cases and further technical growth.

Next Steps

  • Integrate Active Directory as a new source of events to monitor
  • Implement email alerts for critical failures
  • Expand coverage to virtual machines and other key services
  • Explore additional components like Tempo or Alertmanager to move toward a full-featured monitoring platform

This project allows me to apply theoretical knowledge while building a practical, scalable, and secure monitoring solution. It serves as a solid first step toward mastering modern monitoring systems.