====== Runbook: Grafana Dashboard ====== **Trajanje:** ~20 minuta \\ **Uloga:** DevOps, SRE \\ **Preduvjet:** Grafana, Prometheus kao Datasource Vizualizacija Gateway metrika u Grafani. ---- ===== Tijek rada ===== flowchart TD A[Start] --> B[Datasource dodati] B --> C[Dashboard importirati] C --> D[Panele prilagoditi] D --> E[Varijable konfigurirati] E --> F[Dashboard spremiti] F --> G[Gotovo] style G fill:#e8f5e9 ---- ===== 1. Prometheus Datasource ===== **Grafana UI:** Configuration -> Data Sources -> Add data source Name: Prometheus Type: Prometheus URL: http://prometheus:9090 Access: Server (default) Ili preko Provisioninga: # /etc/grafana/provisioning/datasources/prometheus.yaml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true ---- ===== 2. Dashboard JSON ===== **Dashboard importirati:** Create -> Import -> Paste JSON { "title": "Data Gateway", "uid": "data-gateway", "timezone": "browser", "panels": [ { "title": "Request Rate", "type": "timeseries", "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}, "targets": [{ "expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)", "legendFormat": "{{endpoint}}" }] }, { "title": "Response Time (p95)", "type": "timeseries", "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}, "targets": [{ "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))", "legendFormat": "p95" }] }, { "title": "Error Rate", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8}, "targets": [{ "expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100", "legendFormat": "Error %" }], "fieldConfig": { "defaults": { "unit": "percent", "thresholds": { "steps": [ {"color": "green", "value": null}, {"color": "yellow", "value": 1}, {"color": "red", "value": 5} ] } } } }, { "title": "Memory Usage", "type": "gauge", "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8}, "targets": [{ "expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024", "legendFormat": "Memory MB" }], "fieldConfig": { "defaults": { "unit": "decmbytes", "max": 512, "thresholds": { "steps": [ {"color": "green", "value": null}, {"color": "yellow", "value": 300}, {"color": "red", "value": 450} ] } } } }, { "title": "Active Requests", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 12, "y": 8}, "targets": [{ "expr": "http_requests_in_progress{job=\"data-gateway\"}", "legendFormat": "Active" }] }, { "title": "Uptime", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 18, "y": 8}, "targets": [{ "expr": "time() - process_start_time_seconds{job=\"data-gateway\"}", "legendFormat": "Uptime" }], "fieldConfig": { "defaults": {"unit": "s"} } } ], "templating": { "list": [{ "name": "instance", "type": "query", "query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)", "multi": true, "includeAll": true }] }, "refresh": "10s" } ---- ===== 3. Vazni paneli ===== | Panel | Query | Svrha | |-------|-------|-------| | Request Rate | ''sum(rate(http_requests_total[5m]))'' | Propusnost | | Response Time | ''histogram_quantile(0.95, ...)'' | Latencija | | Error Rate | ''...status=~"5.."... * 100'' | Stopa gresaka | | Memory | ''process_resident_memory_bytes'' | RAM potrosnja | | CPU | ''rate(process_cpu_seconds_total[5m])'' | CPU opterecenje | | Active Requests | ''http_requests_in_progress'' | Paralelnost | ---- ===== 4. Dashboard varijable ===== Za Multi-Instance setupove: Name: instance Type: Query Query: label_values(http_requests_total{job="data-gateway"}, instance) Multi-value: enabled Include All: enabled Zatim u Query: ''http_requests_total{instance=~"$instance"}'' ---- ===== 5. Kontrolna lista ===== | # | Provjera | Da/Ne | |---|-----------|---| | 1 | Prometheus Datasource konfiguriran | - | | 2 | Dashboard importiran | - | | 3 | Metrike se prikazuju | - | | 4 | Varijable rade | - | | 5 | Dashboard spremljen | - | ---- ===== Rjesavanje problema ===== | Problem | Uzrok | Rjesenje | |---------|---------|--------| | ''No data'' | Pogresan Job-name | ''job="data-gateway"'' provjeriti | | ''Datasource error'' | Prometheus nije dostupan | URL provjeriti | | Prazni grafovi | Nema prometa | Gateway koristiti | | Pogresne vrijednosti | Pogresan Query | PromQL sintaksu provjeriti | ---- ===== Dashboard-Export ===== # Dashboard kao JSON eksportirati curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \ "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json # Dashboard importirati curl -X POST -H "Content-Type: application/json" \ -H "Authorization: Bearer $GRAFANA_TOKEN" \ -d @dashboard.json \ "http://grafana:3000/api/dashboards/db" ---- ===== Povezani runbookovi ===== * [[.:prometheus|Prometheus]] - Izvor podataka * [[.:alerting|Alerting]] - Obavijesti * [[..:tagesgeschaeft:health-check|Health Check]] - Rucna provjera ---- << [[.:prometheus|<- Prometheus]] | [[.:alerting|-> Alerting]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional// {{tag>operator runbook grafana dashboard visualisierung}}