Runbook: Prometheus

Durata: ~15 minuti
Ruolo: DevOps, SRE
Prerequisito: Prometheus Server, Gateway in esecuzione

Raccogliere metriche dal Data Gateway con Prometheus.

Workflow

flowchart TD A[Start] --> B[Attivare Metrics] B --> C[Prometheus Config] C --> D[Aggiungere Scrape-Job] D --> E[Prometheus reload] E --> F[Verificare Targets] F --> G{Up?} G -->|Si| H[Finito] G -->|No| I[Controllare Firewall/Endpoint] style H fill:#e8f5e9 style I fill:#ffebee

1. Attivare Metrics nel Gateway

appsettings.json:

{
  "Metrics": {
    "Enabled": true,
    "Endpoint": "/metrics"
  }
}

Oppure via NuGet (se non integrato):

# prometheus-net.AspNetCore
dotnet add package prometheus-net.AspNetCore

Program.cs:

// Metrics Middleware
app.UseHttpMetrics();
app.MapMetrics(); // /metrics Endpoint

2. Testare Endpoint Metrics

curl http://localhost:5000/metrics
 
# Output atteso (formato Prometheus):
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
# http_requests_total{method="GET",endpoint="/api/v1/dsn/demo/tables",status="200"} 42

3. Configurazione Prometheus

/etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Data Gateway
  - job_name: 'data-gateway'
    static_configs:
      - targets: ['gateway.example.com:5000']
    metrics_path: /metrics
    scheme: http  # oppure https
 
  # Piu istanze
  - job_name: 'data-gateway-cluster'
    static_configs:
      - targets:
          - 'gateway-1.example.com:5000'
          - 'gateway-2.example.com:5000'
          - 'gateway-3.example.com:5000'

4. Ricaricare Prometheus

# Config-Reload (senza restart)
curl -X POST http://localhost:9090/-/reload
 
# Oppure Restart
sudo systemctl restart prometheus

5. Verificare Targets

Web UI: http://prometheus:9090/targets

Oppure via API:

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

Output atteso:

{
  "job": "data-gateway",
  "health": "up"
}

6. Query Importanti

Esempi PromQL:

# Request-Rate (al secondo)
rate(http_requests_total{job="data-gateway"}[5m])

# Tempo di risposta medio
rate(http_request_duration_seconds_sum{job="data-gateway"}[5m])
/
rate(http_request_duration_seconds_count{job="data-gateway"}[5m])

# Error-Rate (5xx)
sum(rate(http_requests_total{job="data-gateway",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="data-gateway"}[5m]))

# Utilizzo memoria
process_resident_memory_bytes{job="data-gateway"}

# Connessioni attive
http_requests_in_progress{job="data-gateway"}

7. Checklist

#	Punto di verifica	v
—	———–	—
1	Endpoint Metrics attivato	☐
2	/metrics raggiungibile	☐
3	Prometheus-Config aggiornata	☐
4	Prometheus reloaded	☐
5	Target „up“ in Prometheus	☐
6	Metriche visibili in Grafana	☐

Troubleshooting

Problema	Causa	Soluzione
———	———	——–
Target „down“	Endpoint non raggiungibile	Controllare Firewall, URL
`connection refused`	Gateway non in esecuzione	Avviare Gateway
`404 Not Found`	Metrics non attivato	controllare appsettings.json
Nessuna metrica	Percorso errato	controllare `metrics_path`

Kubernetes ServiceMonitor

Per Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: data-gateway
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: data-gateway
  namespaceSelector:
    matchNames:
      - data-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Runbook Correlati

Grafana Dashboard - Visualizzazione
Alerting - Notifiche
Kubernetes - K8s Deployment

« <- Monitoring | -> Grafana Dashboard »

Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional

operator, runbook, prometheus, metrics, monitoring