Inhaltsverzeichnis

Runbook: Grafana Dashboard

Trajanje: ~20 minuta
Uloga: DevOps, SRE
Preduvjet: Grafana, Prometheus kao Datasource

Vizualizacija Gateway metrika u Grafani.


Tijek rada

flowchart TD A[Start] --> B[Datasource dodati] B --> C[Dashboard importirati] C --> D[Panele prilagoditi] D --> E[Varijable konfigurirati] E --> F[Dashboard spremiti] F --> G[Gotovo] style G fill:#e8f5e9


1. Prometheus Datasource

Grafana UI: Configuration → Data Sources → Add data source

Name: Prometheus
Type: Prometheus
URL: http://prometheus:9090
Access: Server (default)

Ili preko Provisioninga:

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

2. Dashboard JSON

Dashboard importirati: Create → Import → Paste JSON

{
  "title": "Data Gateway",
  "uid": "data-gateway",
  "timezone": "browser",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)",
        "legendFormat": "{{endpoint}}"
      }]
    },
    {
      "title": "Response Time (p95)",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
      "targets": [{
        "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))",
        "legendFormat": "p95"
      }]
    },
    {
      "title": "Error Rate",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
      "targets": [{
        "expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100",
        "legendFormat": "Error %"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 1},
              {"color": "red", "value": 5}
            ]
          }
        }
      }
    },
    {
      "title": "Memory Usage",
      "type": "gauge",
      "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8},
      "targets": [{
        "expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024",
        "legendFormat": "Memory MB"
      }],
      "fieldConfig": {
        "defaults": {
          "unit": "decmbytes",
          "max": 512,
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 300},
              {"color": "red", "value": 450}
            ]
          }
        }
      }
    },
    {
      "title": "Active Requests",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 12, "y": 8},
      "targets": [{
        "expr": "http_requests_in_progress{job=\"data-gateway\"}",
        "legendFormat": "Active"
      }]
    },
    {
      "title": "Uptime",
      "type": "stat",
      "gridPos": {"h": 4, "w": 6, "x": 18, "y": 8},
      "targets": [{
        "expr": "time() - process_start_time_seconds{job=\"data-gateway\"}",
        "legendFormat": "Uptime"
      }],
      "fieldConfig": {
        "defaults": {"unit": "s"}
      }
    }
  ],
  "templating": {
    "list": [{
      "name": "instance",
      "type": "query",
      "query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)",
      "multi": true,
      "includeAll": true
    }]
  },
  "refresh": "10s"
}

3. Vazni paneli

Panel Query Svrha
——-——-——-
Request Rate sum(rate(http_requests_total[5m])) Propusnost
Response Time histogram_quantile(0.95, …) Latencija
Error Rate …status=~„5..“… * 100 Stopa gresaka
Memory process_resident_memory_bytes RAM potrosnja
CPU rate(process_cpu_seconds_total[5m]) CPU opterecenje
Active Requests http_requests_in_progress Paralelnost

4. Dashboard varijable

Za Multi-Instance setupove:

Name: instance
Type: Query
Query: label_values(http_requests_total{job="data-gateway"}, instance)
Multi-value: enabled
Include All: enabled

Zatim u Query: http_requests_total{instance=~„$instance“}


5. Kontrolna lista

# Provjera Da/Ne
———–
1 Prometheus Datasource konfiguriran -
2 Dashboard importiran -
3 Metrike se prikazuju -
4 Varijable rade -
5 Dashboard spremljen -

Rjesavanje problema

Problem Uzrok Rjesenje
————————–
No data Pogresan Job-name job=„data-gateway“ provjeriti
Datasource error Prometheus nije dostupan URL provjeriti
Prazni grafovi Nema prometa Gateway koristiti
Pogresne vrijednosti Pogresan Query PromQL sintaksu provjeriti

Dashboard-Export

# Dashboard kao JSON eksportirati
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
    "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json
 
# Dashboard importirati
curl -X POST -H "Content-Type: application/json" \
    -H "Authorization: Bearer $GRAFANA_TOKEN" \
    -d @dashboard.json \
    "http://grafana:3000/api/dashboards/db"

Povezani runbookovi


« <- Prometheus | -> Alerting »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional