====== Runbook: Prometheus ======

<WRAP round info>
**Dauer:** ~15 Minuten \\
**Rolle:** DevOps, SRE \\
**Voraussetzung:** Prometheus Server, Gateway läuft
</WRAP>

Metriken vom Data Gateway mit Prometheus sammeln.

----

===== Workflow =====

<mermaid>
flowchart TD
    A[Start] --> B[Metrics aktivieren]
    B --> C[Prometheus Config]
    C --> D[Scrape-Job hinzufügen]
    D --> E[Prometheus reload]
    E --> F[Targets prüfen]
    F --> G{Up?}
    G -->|Ja| H[Fertig]
    G -->|Nein| I[Firewall/Endpoint prüfen]

    style H fill:#e8f5e9
    style I fill:#ffebee
</mermaid>

----

===== 1. Metrics im Gateway aktivieren =====

**appsettings.json:**

<code json>
{
  "Metrics": {
    "Enabled": true,
    "Endpoint": "/metrics"
  }
}
</code>

**Oder via NuGet (wenn nicht eingebaut):**

<code bash>
# prometheus-net.AspNetCore
dotnet add package prometheus-net.AspNetCore
</code>

**Program.cs:**

<code csharp>
// Metrics Middleware
app.UseHttpMetrics();
app.MapMetrics(); // /metrics Endpoint
</code>

----

===== 2. Metrics-Endpoint testen =====

<code bash>
curl http://localhost:5000/metrics

# Erwartete Ausgabe (Prometheus-Format):
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
# http_requests_total{method="GET",endpoint="/api/v1/dsn/demo/tables",status="200"} 42
</code>

----

===== 3. Prometheus-Konfiguration =====

**/etc/prometheus/prometheus.yml:**

<code yaml>
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Data Gateway
  - job_name: 'data-gateway'
    static_configs:
      - targets: ['gateway.example.com:5000']
    metrics_path: /metrics
    scheme: http  # oder https

  # Mehrere Instanzen
  - job_name: 'data-gateway-cluster'
    static_configs:
      - targets:
          - 'gateway-1.example.com:5000'
          - 'gateway-2.example.com:5000'
          - 'gateway-3.example.com:5000'
</code>

----

===== 4. Prometheus neu laden =====

<code bash>
# Config-Reload (ohne Neustart)
curl -X POST http://localhost:9090/-/reload

# Oder Neustart
sudo systemctl restart prometheus
</code>

----

===== 5. Targets prüfen =====

**Web UI:** ''http://prometheus:9090/targets''

Oder via API:

<code bash>
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'
</code>

**Erwartete Ausgabe:**

<code json>
{
  "job": "data-gateway",
  "health": "up"
}
</code>

----

===== 6. Wichtige Queries =====

**PromQL-Beispiele:**

<code>
# Request-Rate (pro Sekunde)
rate(http_requests_total{job="data-gateway"}[5m])

# Durchschnittliche Response-Zeit
rate(http_request_duration_seconds_sum{job="data-gateway"}[5m])
/
rate(http_request_duration_seconds_count{job="data-gateway"}[5m])

# Error-Rate (5xx)
sum(rate(http_requests_total{job="data-gateway",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="data-gateway"}[5m]))

# Memory-Nutzung
process_resident_memory_bytes{job="data-gateway"}

# Aktive Connections
http_requests_in_progress{job="data-gateway"}
</code>

----

===== 7. Checkliste =====

| # | Prüfpunkt | ✓ |
|---|-----------|---|
| 1 | Metrics-Endpoint aktiviert | ☐ |
| 2 | /metrics erreichbar | ☐ |
| 3 | Prometheus-Config aktualisiert | ☐ |
| 4 | Prometheus reloaded | ☐ |
| 5 | Target "up" in Prometheus | ☐ |
| 6 | Metriken in Grafana sichtbar | ☐ |

----

===== Troubleshooting =====

| Problem | Ursache | Lösung |
|---------|---------|--------|
| Target "down" | Endpoint nicht erreichbar | Firewall, URL prüfen |
| ''connection refused'' | Gateway läuft nicht | Gateway starten |
| ''404 Not Found'' | Metrics nicht aktiviert | appsettings.json prüfen |
| Keine Metriken | Falscher Pfad | ''metrics_path'' prüfen |

----

===== Kubernetes ServiceMonitor =====

Für Prometheus Operator:

<code yaml>
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: data-gateway
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: data-gateway
  namespaceSelector:
    matchNames:
      - data-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 15s
</code>

----

===== Verwandte Runbooks =====

  * [[.:grafana-dashboard|Grafana Dashboard]] – Visualisierung
  * [[.:alerting|Alerting]] – Benachrichtigungen
  * [[..:automatisierung:kubernetes|Kubernetes]] – K8s Deployment

----

<< [[.:start|← Monitoring]] | [[.:grafana-dashboard|→ Grafana Dashboard]] >>

----
//Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional//

{{tag>operator runbook prometheus metrics monitoring}}