Monitoring

Target audience: DevOps, SRE
Content: Metrics, dashboards, alerting
Tools: Prometheus, Grafana, Alertmanager

Monitoring the Data Gateway for proactive error detection.

Workflow

flowchart LR subgraph GATEWAY["DATA GATEWAY"] G1[/metrics Endpoint] G2[/health Endpoint] end subgraph COLLECT["COLLECTION"] P[Prometheus] end subgraph VISUAL["VISUALIZATION"] GR[Grafana] end subgraph ALERT["ALERTING"] AM[Alertmanager] E[E-Mail/Slack] end G1 --> P G2 --> P P --> GR P --> AM AM --> E style G1 fill:#e3f2fd style P fill:#fff3e0 style GR fill:#e8f5e9 style AM fill:#ffebee

Runbooks

Runbook	Description	Duration
Prometheus	Collect metrics, scrape config	~15 min
Grafana Dashboard	Visualization, pre-built dashboards	~20 min
Alerting	Thresholds, notifications	~15 min

Important Metrics

Metric	Description	Threshold
——–	————-	———–
`http_requests_total`	Number of HTTP requests	-
`http_request_duration_seconds`	Response time	< 1s
`http_requests_in_progress`	Active requests	< 100
`dotnet_gc_memory_total_available_bytes`	Available memory	> 100MB
`process_cpu_seconds_total`	CPU usage	< 80%

Quick Test

# Health Check
curl http://localhost:5000/health
 
# Metrics (when enabled)
curl http://localhost:5000/metrics

Related Runbooks

Health Check - Manual check
Check Logs - Error analysis
Security - TLS for metrics

« <- Operator Handbook | -> Prometheus »

Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional

operator, monitoring, prometheus, grafana

Inhaltsverzeichnis