====== Monitoring ====== **Target audience:** DevOps, SRE \\ **Content:** Metrics, dashboards, alerting \\ **Tools:** Prometheus, Grafana, Alertmanager Monitoring the Data Gateway for proactive error detection. ---- ===== Workflow ===== flowchart LR subgraph GATEWAY["DATA GATEWAY"] G1[/metrics Endpoint] G2[/health Endpoint] end subgraph COLLECT["COLLECTION"] P[Prometheus] end subgraph VISUAL["VISUALIZATION"] GR[Grafana] end subgraph ALERT["ALERTING"] AM[Alertmanager] E[E-Mail/Slack] end G1 --> P G2 --> P P --> GR P --> AM AM --> E style G1 fill:#e3f2fd style P fill:#fff3e0 style GR fill:#e8f5e9 style AM fill:#ffebee ---- ===== Runbooks ===== ^ Runbook ^ Description ^ Duration ^ | [[.:prometheus|Prometheus]] | Collect metrics, scrape config | ~15 min | | [[.:grafana-dashboard|Grafana Dashboard]] | Visualization, pre-built dashboards | ~20 min | | [[.:alerting|Alerting]] | Thresholds, notifications | ~15 min | ---- ===== Important Metrics ===== | Metric | Description | Threshold | |--------|-------------|-----------| | ''http_requests_total'' | Number of HTTP requests | - | | ''http_request_duration_seconds'' | Response time | < 1s | | ''http_requests_in_progress'' | Active requests | < 100 | | ''dotnet_gc_memory_total_available_bytes'' | Available memory | > 100MB | | ''process_cpu_seconds_total'' | CPU usage | < 80% | ---- ===== Quick Test ===== # Health Check curl http://localhost:5000/health # Metrics (when enabled) curl http://localhost:5000/metrics ---- ===== Related Runbooks ===== * [[..:tagesgeschaeft:health-check|Health Check]] - Manual check * [[..:tagesgeschaeft:logs-pruefen|Check Logs]] - Error analysis * [[..:sicherheit:start|Security]] - TLS for metrics ---- << [[..:start|<- Operator Handbook]] | [[.:prometheus|-> Prometheus]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional// {{tag>operator monitoring prometheus grafana}}