Inhaltsverzeichnis

Runbook: Grafana Dashboard

Duration: ~20 minutes
Role: DevOps, SRE
Prerequisite: Grafana, Prometheus as datasource

Visualization of Gateway metrics in Grafana.


Workflow

flowchart TD A[Start] --> B[Add datasource] B --> C[Import dashboard] C --> D[Customize panels] D --> E[Configure variables] E --> F[Save dashboard] F --> G[Done] style G fill:#e8f5e9


1. Prometheus Datasource

Grafana UI: Configuration → Data Sources → Add data source

Name: Prometheus
Type: Prometheus
URL: http://prometheus:9090
Access: Server (default)

Or via provisioning:

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

2. Dashboard JSON

Import dashboard: Create → Import → Paste JSON

{
  "title": "Data Gateway",
  "uid": "data-gateway",
  "timezone": "browser",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)",
        "legendFormat": "{{endpoint}}"
      }]
    },
    {
      "title": "Response Time (p95)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
      "targets": [{
        "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))",
        "legendFormat": "p95"
      }]
    },
    {
      "title": "Error Rate",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100",
        "legendFormat": "Error %"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 1},
              {"color": "red", "value": 5}
            ]
          }
        }
      }
    },
    {
      "title": "Memory Usage",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8},
      "targets": [{
        "expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024",
        "legendFormat": "Memory MB"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "decmbytes",
          "max": 512,
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 300},
              {"color": "red", "value": 450}
            ]
          }
        }
      }
    },
    {
      "title": "Active Requests",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 12, "y": 8},
      "targets": [{
        "expr": "http_requests_in_progress{job=\"data-gateway\"}",
        "legendFormat": "Active"
      }]
    },
    {
      "title": "Uptime",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 18, "y": 8},
      "targets": [{
        "expr": "time() - process_start_time_seconds{job=\"data-gateway\"}",
        "legendFormat": "Uptime"
      }],
      "fieldConfig": {
        "defaults": {"unit": "s"}
      }
    }
  ],
  "templating": {
    "list": [{
      "name": "instance",
      "type": "query",
      "query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)",
      "multi": true,
      "includeAll": true
    }]
  },
  "refresh": "10s"
}

3. Important Panels

Panel Query Purpose
——-——-———
Request Rate sum(rate(http_requests_total[5m])) Throughput
Response Time histogram_quantile(0.95, …) Latency
Error Rate …status=~„5..“… * 100 Error quota
Memory process_resident_memory_bytes RAM usage
CPU rate(process_cpu_seconds_total[5m]) CPU load
Active Requests http_requests_in_progress Concurrency

4. Dashboard Variables

For multi-instance setups:

Name: instance
Type: Query
Query: label_values(http_requests_total{job="data-gateway"}, instance)
Multi-value: enabled
Include All: enabled

Then in queries: http_requests_total{instance=~„$instance“}


5. Checklist

# Check Done
——-——
1 Prometheus datasource configured [ ]
2 Dashboard imported [ ]
3 Metrics are displayed [ ]
4 Variables work [ ]
5 Dashboard saved [ ]

Troubleshooting

Problem Cause Solution
—————-———-
No data Wrong job name Check job=„data-gateway“
Datasource error Prometheus not reachable Check URL
Empty graphs No traffic Use Gateway
Wrong values Wrong query Check PromQL syntax

Dashboard Export

# Export dashboard as JSON
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
    "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json
 
# Import dashboard
curl -X POST -H "Content-Type: application/json" \
    -H "Authorization: Bearer $GRAFANA_TOKEN" \
    -d @dashboard.json \
    "http://grafana:3000/api/dashboards/db"


« <- Prometheus | -> Alerting »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional