====== Monitoring ======
**Target audience:** DevOps, SRE \\
**Content:** Metrics, dashboards, alerting \\
**Tools:** Prometheus, Grafana, Alertmanager
Monitoring the Data Gateway for proactive error detection.
----
===== Workflow =====
flowchart LR
subgraph GATEWAY["DATA GATEWAY"]
G1[/metrics Endpoint]
G2[/health Endpoint]
end
subgraph COLLECT["COLLECTION"]
P[Prometheus]
end
subgraph VISUAL["VISUALIZATION"]
GR[Grafana]
end
subgraph ALERT["ALERTING"]
AM[Alertmanager]
E[E-Mail/Slack]
end
G1 --> P
G2 --> P
P --> GR
P --> AM
AM --> E
style G1 fill:#e3f2fd
style P fill:#fff3e0
style GR fill:#e8f5e9
style AM fill:#ffebee
----
===== Runbooks =====
^ Runbook ^ Description ^ Duration ^
| [[.:prometheus|Prometheus]] | Collect metrics, scrape config | ~15 min |
| [[.:grafana-dashboard|Grafana Dashboard]] | Visualization, pre-built dashboards | ~20 min |
| [[.:alerting|Alerting]] | Thresholds, notifications | ~15 min |
----
===== Important Metrics =====
| Metric | Description | Threshold |
|--------|-------------|-----------|
| ''http_requests_total'' | Number of HTTP requests | - |
| ''http_request_duration_seconds'' | Response time | < 1s |
| ''http_requests_in_progress'' | Active requests | < 100 |
| ''dotnet_gc_memory_total_available_bytes'' | Available memory | > 100MB |
| ''process_cpu_seconds_total'' | CPU usage | < 80% |
----
===== Quick Test =====
# Health Check
curl http://localhost:5000/health
# Metrics (when enabled)
curl http://localhost:5000/metrics
----
===== Related Runbooks =====
* [[..:tagesgeschaeft:health-check|Health Check]] - Manual check
* [[..:tagesgeschaeft:logs-pruefen|Check Logs]] - Error analysis
* [[..:sicherheit:start|Security]] - TLS for metrics
----
<< [[..:start|<- Operator Handbook]] | [[.:prometheus|-> Prometheus]] >>
----
//Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional//
{{tag>operator monitoring prometheus grafana}}