Inhaltsverzeichnis

Runbook: Prometheus

Durata: ~15 minuti
Ruolo: DevOps, SRE
Prerequisito: Prometheus Server, Gateway in esecuzione

Raccogliere metriche dal Data Gateway con Prometheus.


Workflow

flowchart TD A[Start] --> B[Attivare Metrics] B --> C[Prometheus Config] C --> D[Aggiungere Scrape-Job] D --> E[Prometheus reload] E --> F[Verificare Targets] F --> G{Up?} G -->|Si| H[Finito] G -->|No| I[Controllare Firewall/Endpoint] style H fill:#e8f5e9 style I fill:#ffebee


1. Attivare Metrics nel Gateway

appsettings.json:

{
  "Metrics": {
    "Enabled": true,
    "Endpoint": "/metrics"
  }
}

Oppure via NuGet (se non integrato):

# prometheus-net.AspNetCore
dotnet add package prometheus-net.AspNetCore

Program.cs:

// Metrics Middleware
app.UseHttpMetrics();
app.MapMetrics(); // /metrics Endpoint

2. Testare Endpoint Metrics

curl http://localhost:5000/metrics
 
# Output atteso (formato Prometheus):
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
# http_requests_total{method="GET",endpoint="/api/v1/dsn/demo/tables",status="200"} 42

3. Configurazione Prometheus

/etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Data Gateway
  - job_name: 'data-gateway'
    static_configs:
      - targets: ['gateway.example.com:5000']
    metrics_path: /metrics
    scheme: http  # oppure https
 
  # Piu istanze
  - job_name: 'data-gateway-cluster'
    static_configs:
      - targets:
          - 'gateway-1.example.com:5000'
          - 'gateway-2.example.com:5000'
          - 'gateway-3.example.com:5000'

4. Ricaricare Prometheus

# Config-Reload (senza restart)
curl -X POST http://localhost:9090/-/reload
 
# Oppure Restart
sudo systemctl restart prometheus

5. Verificare Targets

Web UI: http://prometheus:9090/targets

Oppure via API:

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

Output atteso:

{
  "job": "data-gateway",
  "health": "up"
}

6. Query Importanti

Esempi PromQL:

# Request-Rate (al secondo)
rate(http_requests_total{job="data-gateway"}[5m])

# Tempo di risposta medio
rate(http_request_duration_seconds_sum{job="data-gateway"}[5m])
/
rate(http_request_duration_seconds_count{job="data-gateway"}[5m])

# Error-Rate (5xx)
sum(rate(http_requests_total{job="data-gateway",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="data-gateway"}[5m]))

# Utilizzo memoria
process_resident_memory_bytes{job="data-gateway"}

# Connessioni attive
http_requests_in_progress{job="data-gateway"}

7. Checklist

# Punto di verifica v
———–
1 Endpoint Metrics attivato
2 /metrics raggiungibile
3 Prometheus-Config aggiornata
4 Prometheus reloaded
5 Target „up“ in Prometheus
6 Metriche visibili in Grafana

Troubleshooting

Problema Causa Soluzione
————————–
Target „down“ Endpoint non raggiungibile Controllare Firewall, URL
connection refused Gateway non in esecuzione Avviare Gateway
404 Not Found Metrics non attivato controllare appsettings.json
Nessuna metrica Percorso errato controllare metrics_path

Kubernetes ServiceMonitor

Per Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: data-gateway
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: data-gateway
  namespaceSelector:
    matchNames:
      - data-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Runbook Correlati


« <- Monitoring | -> Grafana Dashboard »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional