Monitoring Stack
Vollständige Observability für den NUC-Homeserver mit Metriken, Logs, Alerting und Dashboards.
Prometheus · VictoriaMetrics · VictoriaLogs · Grafana · Alertmanager · Alloy
Zugriff
| Service |
URL |
Projekt |
| Grafana |
https://grafana.home.robinwerner.net |
GitHub |
| Prometheus |
https://prometheus.home.robinwerner.net |
GitHub |
| Alertmanager |
https://alertmanager.home.robinwerner.net |
GitHub |
|
|
| Netzwerk |
monitoring (intern), proxy_network (Traefik), influxdbnet (InfluxDB) |
| Traefik |
Ja (Grafana, Prometheus, Alertmanager) |
Services
| # |
Service |
Image |
Version |
Funktion |
| 1 |
Prometheus |
prom/prometheus |
v3.10.0 |
Metriken-Sammlung, Alerting, remote_write |
| 2 |
VictoriaMetrics |
victoriametrics/victoria-metrics |
v1.137.0 |
Langzeit-Metriken (5 Jahre, NFS) |
| 3 |
node-exporter |
prom/node-exporter |
v1.10.2 |
Host-Metriken (CPU, RAM, Disk) |
| 4 |
cAdvisor |
gcr.io/cadvisor/cadvisor |
v0.55.1 |
Container-Metriken |
| 5 |
Grafana |
grafana/grafana-oss |
12.4.1 |
Dashboards und Visualisierung |
| 6 |
VictoriaLogs |
victoriametrics/victoria-logs |
v1.48.0 |
Log-Aggregation (5 Jahre, NFS) |
| 7 |
Alloy |
grafana/alloy |
v1.14.0 |
Log-Collector (Docker + UniFi Syslog) |
| 8 |
Alertmanager |
prom/alertmanager |
v0.31.1 |
Alert-Routing zu ntfy |
| 9 |
UnPoller |
ghcr.io/unpoller/unpoller |
v2.34.0 |
UniFi-Netzwerk-Metriken |
| 10 |
Pi-hole Exporter |
ekofr/pihole-exporter |
v1.2.0 |
Pi-hole DNS-Metriken |
| 11 |
MqDockerUp |
micrib/mqdockerup |
v1.23.7 |
Container-Update-Benachrichtigungen via MQTT |
Architektur
NUC (10.10.10.x)
================
+--monitoring-stack/docker-compose.yml--------------------------+
| |
| Prometheus ----remote_write----> VictoriaMetrics |
| | (NFS /mnt/unas) |
| +--scrape--> node_exporter |
| +--scrape--> cAdvisor |
| +--scrape--> Traefik (:8082, via proxy_network) |
| +--scrape--> UnPoller |
| +--scrape--> Pi-hole Exporter |
| +--scrape--> Home Assistant (10.10.10.3:8123) |
| | |
| +--evaluate--> Alert Rules |
| | |
| v |
| Alertmanager ---webhook---> ntfy (Hetzner) |
| via HTTPS (public URL) |
| |
| Alloy ---push---> VictoriaLogs (NFS /mnt/unas) |
| +--collect--> Docker Logs (alle Container) |
| +--collect--> UniFi Syslog (UDP :514, CEF-Format) |
| |
| Grafana <--- Prometheus (7d) |
| <--- VictoriaMetrics (5y) |
| <--- VictoriaLogs (5y) |
| <--- InfluxDB (extern, bestehendes Setup) |
| |
| MqDockerUp ---MQTT---> Mosquitto (extern) ---> Home Assistant|
| |
+---------------------------------------------------------------+
Hetzner vServer (extern, bereits produktiv)
============================================
Uptime Kuma ---> ntfy (Push bei NUC-Ausfall)
Healthchecks <--- Heartbeat-Ping vom NUC
ntfy Empfängt Alerts von Alertmanager + Uptime Kuma
Netzwerke
| Netzwerk |
Typ |
Verwendung |
monitoring |
bridge (intern) |
Kommunikation aller Monitoring-Komponenten |
proxy_network |
extern (Traefik) |
Grafana, Prometheus, Alertmanager (Traefik-Routing); MqDockerUp (Mosquitto-Zugriff); Prometheus (Traefik-Scraping) |
influxdbnet |
extern |
Grafana -> InfluxDB (Speedtest, HA Longterm) |
Security-Hardening
Alle Container erhalten security_opt: no-new-privileges:true und cap_drop: ALL.
| Service |
Zusätzliche Capabilities |
Grund |
| Alloy |
cap_add: DAC_OVERRIDE |
mkdir für WAL-Verzeichnis |
| Alle anderen |
Keine |
— |
Docker-Socket-Zugriff (read-only) nur für: Alloy, cAdvisor, MqDockerUp.
Secrets via ${ENV_VARS} oder password_file — nie Klartext in Configs.
Speicherplanung
SSD (NUC)
| Komponente |
Pfad |
Größe |
| Prometheus (7d) |
/mnt/ssd/container-data/monitoring-stack/prometheus |
~5 GB |
| Grafana |
/mnt/ssd/container-data/monitoring-stack/grafana |
~1 GB |
| Alertmanager |
/mnt/ssd/container-data/monitoring-stack/alertmanager |
<100 MB |
| Alloy WAL |
/mnt/ssd/container-data/monitoring-stack/alloy |
<1 GB |
| MqDockerUp |
/mnt/ssd/container-data/monitoring-stack/mqdockerup |
<100 MB |
| Gesamt |
|
~7 GB |
NFS (UNAS Pro)
| Komponente |
Pfad |
Größe |
| VictoriaMetrics (5y) |
/mnt/monitoring/victoriametrics |
~500 GB |
| VictoriaLogs (5y) |
/mnt/monitoring/victorialogs |
~50-125 GB |
| Gesamt |
|
~550-625 GB |
Externe Abhängigkeiten
Diese Services laufen bereits und werden nicht im Monitoring-Stack verwaltet:
| Service |
Verzeichnis |
Wird benötigt für |
| Traefik |
traefik/ |
Reverse Proxy, SSL, Metriken-Endpoint (:8082) |
| InfluxDB |
influxdb/ |
Grafana-Datasource (Speedtest, HA Longterm) |
| Mosquitto |
Smart-Home-Stack |
MqDockerUp MQTT-Broker |
| Pi-hole |
pihole/ |
DNS, Pi-hole Exporter Datenquelle |
Dateistruktur
monitoring-stack/
docker-compose.yml # Alle 11 Services, 3 Netzwerke
.env # Secrets (gitignored)
secrets/
ntfy-password # Alertmanager ntfy-Auth
ha-prometheus-token # Home Assistant Bearer-Token
configs/
prometheus/
prometheus.yml # Scrape-Configs + remote_write
rules/
node-alerts.yml # Host-level Alerts
container-alerts.yml # Container Alerts
alertmanager/
alertmanager.yml # Routes zu ntfy via HTTPS
alloy/
config.alloy # Docker-Logs + UniFi Syslog
grafana/
provisioning/
datasources/
datasources.yml # Prometheus, VM, VLogs, InfluxDB
dashboards/
dashboards.yml # Provisioning-Config
json/ # 17 provisionierte Dashboards