Runbook: Grafana Dashboard

Durata: ~20 minuti
Ruolo: DevOps, SRE
Prerequisito: Grafana, Prometheus come Datasource

Visualizzazione delle metriche Gateway in Grafana.


Workflow

flowchart TD A[Start] --> B[Aggiungere Datasource] B --> C[Importare Dashboard] C --> D[Personalizzare Pannelli] D --> E[Configurare Variabili] E --> F[Salvare Dashboard] F --> G[Finito] style G fill:#e8f5e9


1. Prometheus Datasource

Grafana UI: Configuration → Data Sources → Add data source

Name: Prometheus
Type: Prometheus
URL: http://prometheus:9090
Access: Server (default)

Oppure via Provisioning:

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

2. Dashboard JSON

Importare Dashboard: Create → Import → Paste JSON

{
  "title": "Data Gateway",
  "uid": "data-gateway",
  "timezone": "browser",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)",
        "legendFormat": "{{endpoint}}"
      }]
    },
    {
      "title": "Response Time (p95)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
      "targets": [{
        "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))",
        "legendFormat": "p95"
      }]
    },
    {
      "title": "Error Rate",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100",
        "legendFormat": "Error %"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 1},
              {"color": "red", "value": 5}
            ]
          }
        }
      }
    },
    {
      "title": "Memory Usage",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8},
      "targets": [{
        "expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024",
        "legendFormat": "Memory MB"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "decmbytes",
          "max": 512,
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 300},
              {"color": "red", "value": 450}
            ]
          }
        }
      }
    },
    {
      "title": "Active Requests",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 12, "y": 8},
      "targets": [{
        "expr": "http_requests_in_progress{job=\"data-gateway\"}",
        "legendFormat": "Active"
      }]
    },
    {
      "title": "Uptime",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 18, "y": 8},
      "targets": [{
        "expr": "time() - process_start_time_seconds{job=\"data-gateway\"}",
        "legendFormat": "Uptime"
      }],
      "fieldConfig": {
        "defaults": {"unit": "s"}
      }
    }
  ],
  "templating": {
    "list": [{
      "name": "instance",
      "type": "query",
      "query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)",
      "multi": true,
      "includeAll": true
    }]
  },
  "refresh": "10s"
}

3. Pannelli Importanti

Pannello Query Scopo
——-——-——-
Request Rate sum(rate(http_requests_total[5m])) Throughput
Response Time histogram_quantile(0.95, …) Latenza
Error Rate …status=~„5..“… * 100 Tasso errori
Memory process_resident_memory_bytes Utilizzo RAM
CPU rate(process_cpu_seconds_total[5m]) Carico CPU
Active Requests http_requests_in_progress Parallelismo

4. Variabili Dashboard

Per setup multi-istanza:

Name: instance
Type: Query
Query: label_values(http_requests_total{job="data-gateway"}, instance)
Multi-value: enabled
Include All: enabled

Poi nelle Query: http_requests_total{instance=~„$instance“}


5. Checklist

# Punto di verifica v
———–
1 Prometheus Datasource configurato
2 Dashboard importata
3 Metriche visualizzate
4 Variabili funzionanti
5 Dashboard salvata

Troubleshooting

Problema Causa Soluzione
————————–
No data Nome Job errato verificare job=„data-gateway“
Datasource error Prometheus non raggiungibile Controllare URL
Grafici vuoti Nessun traffico Usare Gateway
Valori errati Query errata controllare sintassi PromQL

Dashboard-Export

# Esportare Dashboard come JSON
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
    "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json
 
# Importare Dashboard
curl -X POST -H "Content-Type: application/json" \
    -H "Authorization: Bearer $GRAFANA_TOKEN" \
    -d @dashboard.json \
    "http://grafana:3000/api/dashboards/db"

Runbook Correlati


« <- Prometheus | -> Alerting »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional

Zuletzt geändert: il 29/01/2026 alle 23:35