====== Runbook: Grafana Dashboard ====== **Durata:** ~20 minuti \\ **Ruolo:** DevOps, SRE \\ **Prerequisito:** Grafana, Prometheus come Datasource Visualizzazione delle metriche Gateway in Grafana. ---- ===== Workflow ===== flowchart TD A[Start] --> B[Aggiungere Datasource] B --> C[Importare Dashboard] C --> D[Personalizzare Pannelli] D --> E[Configurare Variabili] E --> F[Salvare Dashboard] F --> G[Finito] style G fill:#e8f5e9 ---- ===== 1. Prometheus Datasource ===== **Grafana UI:** Configuration -> Data Sources -> Add data source Name: Prometheus Type: Prometheus URL: http://prometheus:9090 Access: Server (default) Oppure via Provisioning: # /etc/grafana/provisioning/datasources/prometheus.yaml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true ---- ===== 2. Dashboard JSON ===== **Importare Dashboard:** Create -> Import -> Paste JSON { "title": "Data Gateway", "uid": "data-gateway", "timezone": "browser", "panels": [ { "title": "Request Rate", "type": "timeseries", "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}, "targets": [{ "expr": "sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) by (endpoint)", "legendFormat": "{{endpoint}}" }] }, { "title": "Response Time (p95)", "type": "timeseries", "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}, "targets": [{ "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"data-gateway\"}[5m])) by (le))", "legendFormat": "p95" }] }, { "title": "Error Rate", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8}, "targets": [{ "expr": "sum(rate(http_requests_total{job=\"data-gateway\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{job=\"data-gateway\"}[5m])) * 100", "legendFormat": "Error %" }], "fieldConfig": { "defaults": { "unit": "percent", "thresholds": { "steps": [ {"color": "green", "value": null}, {"color": "yellow", "value": 1}, {"color": "red", "value": 5} ] } } } }, { "title": "Memory Usage", "type": "gauge", "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8}, "targets": [{ "expr": "process_resident_memory_bytes{job=\"data-gateway\"} / 1024 / 1024", "legendFormat": "Memory MB" }], "fieldConfig": { "defaults": { "unit": "decmbytes", "max": 512, "thresholds": { "steps": [ {"color": "green", "value": null}, {"color": "yellow", "value": 300}, {"color": "red", "value": 450} ] } } } }, { "title": "Active Requests", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 12, "y": 8}, "targets": [{ "expr": "http_requests_in_progress{job=\"data-gateway\"}", "legendFormat": "Active" }] }, { "title": "Uptime", "type": "stat", "gridPos": {"h": 4, "w": 6, "x": 18, "y": 8}, "targets": [{ "expr": "time() - process_start_time_seconds{job=\"data-gateway\"}", "legendFormat": "Uptime" }], "fieldConfig": { "defaults": {"unit": "s"} } } ], "templating": { "list": [{ "name": "instance", "type": "query", "query": "label_values(http_requests_total{job=\"data-gateway\"}, instance)", "multi": true, "includeAll": true }] }, "refresh": "10s" } ---- ===== 3. Pannelli Importanti ===== | Pannello | Query | Scopo | |-------|-------|-------| | Request Rate | ''sum(rate(http_requests_total[5m]))'' | Throughput | | Response Time | ''histogram_quantile(0.95, ...)'' | Latenza | | Error Rate | ''...status=~"5.."... * 100'' | Tasso errori | | Memory | ''process_resident_memory_bytes'' | Utilizzo RAM | | CPU | ''rate(process_cpu_seconds_total[5m])'' | Carico CPU | | Active Requests | ''http_requests_in_progress'' | Parallelismo | ---- ===== 4. Variabili Dashboard ===== Per setup multi-istanza: Name: instance Type: Query Query: label_values(http_requests_total{job="data-gateway"}, instance) Multi-value: enabled Include All: enabled Poi nelle Query: ''http_requests_total{instance=~"$instance"}'' ---- ===== 5. Checklist ===== | # | Punto di verifica | v | |---|-----------|---| | 1 | Prometheus Datasource configurato | ☐ | | 2 | Dashboard importata | ☐ | | 3 | Metriche visualizzate | ☐ | | 4 | Variabili funzionanti | ☐ | | 5 | Dashboard salvata | ☐ | ---- ===== Troubleshooting ===== | Problema | Causa | Soluzione | |---------|---------|--------| | ''No data'' | Nome Job errato | verificare ''job="data-gateway"'' | | ''Datasource error'' | Prometheus non raggiungibile | Controllare URL | | Grafici vuoti | Nessun traffico | Usare Gateway | | Valori errati | Query errata | controllare sintassi PromQL | ---- ===== Dashboard-Export ===== # Esportare Dashboard come JSON curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \ "http://grafana:3000/api/dashboards/uid/data-gateway" | jq '.dashboard' > dashboard.json # Importare Dashboard curl -X POST -H "Content-Type: application/json" \ -H "Authorization: Bearer $GRAFANA_TOKEN" \ -d @dashboard.json \ "http://grafana:3000/api/dashboards/db" ---- ===== Runbook Correlati ===== * [[.:prometheus|Prometheus]] - Fonte dati * [[.:alerting|Alerting]] - Notifiche * [[..:tagesgeschaeft:health-check|Health Check]] - Verifica manuale ---- << [[.:prometheus|<- Prometheus]] | [[.:alerting|-> Alerting]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional// {{tag>operator runbook grafana dashboard visualisierung}}