Skip to content

Prometheus

Collecte de métriques de toutes les VMs et services du homelab.


Configuration

Image : prom/prometheus:v3.2 Port : 9090 Rétention : 30 jours

Fichier prometheus.yml (déployé via le rôle Ansible monitoring) :

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
        - '192.168.1.20:9100'  # traefik
        - '192.168.1.21:9100'  # gitlab
        - '192.168.1.22:9100'  # vault
        - '192.168.1.30:9100'  # harbor
        - '192.168.1.31:9100'  # monitoring
        - '192.168.1.32:9100'  # keycloak
        - '192.168.1.33:9100'  # defectdojo
        - '192.168.1.40:9100'  # k3s-master
        - '192.168.1.41:9100'  # k3s-worker01
        - '192.168.1.42:9100'  # k3s-worker02

  - job_name: 'traefik'
    static_configs:
      - targets: ['192.168.1.20:8080']

  - job_name: 'k3s'
    static_configs:
      - targets:
        - '192.168.1.40:10250'  # kubelet master
        - '192.168.1.41:10250'  # kubelet worker01
        - '192.168.1.42:10250'  # kubelet worker02

Alerting

Règles de base (alertes définies dans le rôle Ansible) :

Alerte Condition Severité
NodeDown up == 0 pendant 5m critical
DiskFull node_filesystem_avail_bytes < 10% warning
HighCPU node_load1 > 2 pendant 10m warning
HighMemory node_memory_MemAvailable_bytes < 10% warning

Vérification

curl http://192.168.1.31:9090/api/v1/status/config
curl http://192.168.1.31:9090/api/v1/targets

Pour aller plus loin