Inhaltsverzeichnis
Runbook: Grafana Dashboard
Dauer: ~20 Minuten
Rolle: DevOps, SRE
Voraussetzung: Grafana, Prometheus als Datasource
Visualisierung der Gateway-Metriken in Grafana.
Workflow
flowchart TD
A[Start] --> B[Datasource hinzufügen]
B --> C[Dashboard importieren]
C --> D[Panels anpassen]
D --> E[Variablen konfigurieren]
E --> F[Dashboard speichern]
F --> G[Fertig]
style G fill:#e8f5e9
1. Prometheus Datasource
Grafana UI: Configuration → Data Sources → Add data source
Name: Prometheus Type: Prometheus URL: http://prometheus:9090 Access: Server (default)
Oder via Provisioning:
# /etc/grafana/provisioning/datasources/prometheus.yaml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true
2. Dashboard JSON
Dashboard importieren: Create → Import → Paste JSON
{
"title": "Data Gateway",
"uid": "data-gateway",
"timezone": "browser",
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"targets": [{
"expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)",
"legendFormat": "{{endpoint}}"
}]
},
{
"title": "Response Time (p95)",
"type": "timeseries",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"targets": [{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))",
"legendFormat": "p95"
}]
},
{
"title": "Error Rate",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
"targets": [{
"expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100",
"legendFormat": "Error %"
}],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 1},
{"color": "red", "value": 5}
]
}
}
}
},
{
"title": "Memory Usage",
"type": "gauge",
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 8},
"targets": [{
"expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024",
"legendFormat": "Memory MB"
}],
"fieldConfig": {
"defaults": {
"unit": "decmbytes",
"max": 512,
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 300},
{"color": "red", "value": 450}
]
}
}
}
},
{
"title": "Active Requests",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 8},
"targets": [{
"expr": "http_requests_in_progress{job=\"data-gateway\"}",
"legendFormat": "Active"
}]
},
{
"title": "Uptime",
"type": "stat",
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 8},
"targets": [{
"expr": "time() - process_start_time_seconds{job=\"data-gateway\"}",
"legendFormat": "Uptime"
}],
"fieldConfig": {
"defaults": {"unit": "s"}
}
}
],
"templating": {
"list": [{
"name": "instance",
"type": "query",
"query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)",
"multi": true,
"includeAll": true
}]
},
"refresh": "10s"
}
3. Wichtige Panels
| Panel | Query | Zweck |
| ——- | ——- | ——- |
| Request Rate | sum(rate(http_requests_total[5m])) | Durchsatz |
| Response Time | histogram_quantile(0.95, …) | Latenz |
| Error Rate | …status=~„5..“… * 100 | Fehlerquote |
| Memory | process_resident_memory_bytes | RAM-Nutzung |
| CPU | rate(process_cpu_seconds_total[5m]) | CPU-Last |
| Active Requests | http_requests_in_progress | Parallelität |
4. Dashboard-Variablen
Für Multi-Instance-Setups:
Name: instance
Type: Query
Query: label_values(http_requests_total{job="data-gateway"}, instance)
Multi-value: enabled
Include All: enabled
Dann in Queries: http_requests_total{instance=~„$instance“}
5. Checkliste
| # | Prüfpunkt | ✓ |
| — | ———– | — |
| 1 | Prometheus Datasource konfiguriert | ☐ |
| 2 | Dashboard importiert | ☐ |
| 3 | Metriken werden angezeigt | ☐ |
| 4 | Variablen funktionieren | ☐ |
| 5 | Dashboard gespeichert | ☐ |
Troubleshooting
| Problem | Ursache | Lösung |
| ——— | ——— | ——– |
No data | Falscher Job-Name | job=„data-gateway“ prüfen |
Datasource error | Prometheus nicht erreichbar | URL prüfen |
| Leere Graphen | Kein Traffic | Gateway benutzen |
| Falsche Werte | Falsche Query | PromQL syntax prüfen |
Dashboard-Export
# Dashboard als JSON exportieren curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \ "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json # Dashboard importieren curl -X POST -H "Content-Type: application/json" \ -H "Authorization: Bearer $GRAFANA_TOKEN" \ -d @dashboard.json \ "http://grafana:3000/api/dashboards/db"
Verwandte Runbooks
- Prometheus – Datenquelle
- Alerting – Benachrichtigungen
- Health Check – Manuelle Prüfung
« ← Prometheus | → Alerting »
Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional
Zuletzt geändert: den 29.01.2026 um 15:12