Monitoring Stack¶

Vollständige Observability für den NUC-Homeserver mit Metriken, Logs, Alerting und Dashboards.

Prometheus · VictoriaMetrics · VictoriaLogs · Grafana · Alertmanager · Alloy

Zugriff¶

Service	URL	Projekt
Grafana	`https://grafana.home.robinwerner.net`	GitHub
Prometheus	`https://prometheus.home.robinwerner.net`	GitHub
Alertmanager	`https://alertmanager.home.robinwerner.net`	GitHub


Netzwerk	`monitoring` (intern), `proxy_network` (Traefik), `influxdbnet` (InfluxDB)
Traefik	Ja (Grafana, Prometheus, Alertmanager)

Services¶

#	Service	Image	Version	Funktion
1	Prometheus	prom/prometheus	v3.10.0	Metriken-Sammlung, Alerting, remote_write
2	VictoriaMetrics	victoriametrics/victoria-metrics	v1.137.0	Langzeit-Metriken (5 Jahre, NFS)
3	node-exporter	prom/node-exporter	v1.10.2	Host-Metriken (CPU, RAM, Disk)
4	cAdvisor	gcr.io/cadvisor/cadvisor	v0.55.1	Container-Metriken
5	Grafana	grafana/grafana-oss	12.4.1	Dashboards und Visualisierung
6	VictoriaLogs	victoriametrics/victoria-logs	v1.48.0	Log-Aggregation (5 Jahre, NFS)
7	Alloy	grafana/alloy	v1.14.0	Log-Collector (Docker + UniFi Syslog)
8	Alertmanager	prom/alertmanager	v0.31.1	Alert-Routing zu ntfy
9	UnPoller	ghcr.io/unpoller/unpoller	v2.34.0	UniFi-Netzwerk-Metriken
10	Pi-hole Exporter	ekofr/pihole-exporter	v1.2.0	Pi-hole DNS-Metriken
11	MqDockerUp	micrib/mqdockerup	v1.23.7	Container-Update-Benachrichtigungen via MQTT

Architektur¶

NUC (10.10.10.x)
================

+--monitoring-stack/docker-compose.yml--------------------------+
|                                                                |
|  Prometheus ----remote_write----> VictoriaMetrics              |
|      |                               (NFS /mnt/unas)          |
|      +--scrape--> node_exporter                               |
|      +--scrape--> cAdvisor                                    |
|      +--scrape--> Traefik (:8082, via proxy_network)          |
|      +--scrape--> UnPoller                                    |
|      +--scrape--> Pi-hole Exporter                            |
|      +--scrape--> Home Assistant (10.10.10.3:8123)            |
|      |                                                        |
|      +--evaluate--> Alert Rules                               |
|                       |                                       |
|                       v                                       |
|                  Alertmanager ---webhook---> ntfy (Hetzner)    |
|                                      via HTTPS (public URL)   |
|                                                               |
|  Alloy ---push---> VictoriaLogs (NFS /mnt/unas)              |
|      +--collect--> Docker Logs (alle Container)               |
|      +--collect--> UniFi Syslog (UDP :514, CEF-Format)        |
|                                                               |
|  Grafana <--- Prometheus (7d)                                 |
|          <--- VictoriaMetrics (5y)                            |
|          <--- VictoriaLogs (5y)                               |
|          <--- InfluxDB (extern, bestehendes Setup)            |
|                                                               |
|  MqDockerUp ---MQTT---> Mosquitto (extern) ---> Home Assistant|
|                                                               |
+---------------------------------------------------------------+

Hetzner vServer (extern, bereits produktiv)
============================================
Uptime Kuma ---> ntfy (Push bei NUC-Ausfall)
Healthchecks <--- Heartbeat-Ping vom NUC
ntfy          Empfängt Alerts von Alertmanager + Uptime Kuma

Netzwerke¶

Netzwerk	Typ	Verwendung
`monitoring`	bridge (intern)	Kommunikation aller Monitoring-Komponenten
`proxy_network`	extern (Traefik)	Grafana, Prometheus, Alertmanager (Traefik-Routing); MqDockerUp (Mosquitto-Zugriff); Prometheus (Traefik-Scraping)
`influxdbnet`	extern	Grafana -> InfluxDB (Speedtest, HA Longterm)

Security-Hardening¶

Alle Container erhalten security_opt: no-new-privileges:true und cap_drop: ALL.

Service	Zusätzliche Capabilities	Grund
Alloy	`cap_add: DAC_OVERRIDE`	mkdir für WAL-Verzeichnis
Alle anderen	Keine	—

Docker-Socket-Zugriff (read-only) nur für: Alloy, cAdvisor, MqDockerUp.

Secrets via ${ENV_VARS} oder password_file — nie Klartext in Configs.

Speicherplanung¶

SSD (NUC)¶

Komponente	Pfad	Größe
Prometheus (7d)	`/mnt/ssd/container-data/monitoring-stack/prometheus`	~5 GB
Grafana	`/mnt/ssd/container-data/monitoring-stack/grafana`	~1 GB
Alertmanager	`/mnt/ssd/container-data/monitoring-stack/alertmanager`	<100 MB
Alloy WAL	`/mnt/ssd/container-data/monitoring-stack/alloy`	<1 GB
MqDockerUp	`/mnt/ssd/container-data/monitoring-stack/mqdockerup`	<100 MB
Gesamt		~7 GB

NFS (UNAS Pro)¶

Komponente	Pfad	Größe
VictoriaMetrics (5y)	`/mnt/monitoring/victoriametrics`	~500 GB
VictoriaLogs (5y)	`/mnt/monitoring/victorialogs`	~50-125 GB
Gesamt		~550-625 GB

Externe Abhängigkeiten¶

Diese Services laufen bereits und werden nicht im Monitoring-Stack verwaltet:

Service	Verzeichnis	Wird benötigt für
Traefik	`traefik/`	Reverse Proxy, SSL, Metriken-Endpoint (:8082)
InfluxDB	`influxdb/`	Grafana-Datasource (Speedtest, HA Longterm)
Mosquitto	Smart-Home-Stack	MqDockerUp MQTT-Broker
Pi-hole	`pihole/`	DNS, Pi-hole Exporter Datenquelle

Dateistruktur¶

monitoring-stack/
  docker-compose.yml              # Alle 11 Services, 3 Netzwerke
  .env                            # Secrets (gitignored)
  secrets/
    ntfy-password                 # Alertmanager ntfy-Auth
    ha-prometheus-token           # Home Assistant Bearer-Token
  configs/
    prometheus/
      prometheus.yml              # Scrape-Configs + remote_write
      rules/
        node-alerts.yml           # Host-level Alerts
        container-alerts.yml      # Container Alerts
    alertmanager/
      alertmanager.yml            # Routes zu ntfy via HTTPS
    alloy/
      config.alloy                # Docker-Logs + UniFi Syslog
    grafana/
      provisioning/
        datasources/
          datasources.yml         # Prometheus, VM, VLogs, InfluxDB
        dashboards/
          dashboards.yml          # Provisioning-Config
          json/                   # 17 provisionierte Dashboards