====== Runbook: Health Check ====== **Duration:** ~2 minutes \\ **Role:** Gateway Operator \\ **Frequency:** Daily / Automated Check Gateway availability and functionality. ---- ===== Workflow ===== flowchart TD A[Start Health Check] --> B[/health Endpoint] B --> C{Healthy?} C -->|Yes| D[API Test] C -->|No| E[Check logs] D --> F{Data returned?} F -->|Yes| G[OK - Done] F -->|No| H[Check DSN] E --> I[Restart server] H --> I style G fill:#e8f5e9 style E fill:#ffebee style H fill:#fff3e0 ---- ===== 1. Basic Health Check ===== # Simple Health Check curl -s -o /dev/null -w "%{http_code}" http://localhost:5000/health # Expected response: 200 # With response body curl -s http://localhost:5000/health # Expected response: "Healthy" ---- ===== 2. Extended Health Check ===== # Swagger reachable? curl -s -o /dev/null -w "%{http_code}" http://localhost:5000/swagger # Check API version curl -s http://localhost:5000/api/v1/info | jq ---- ===== 3. DSN Connectivity ===== # Test all DSN for dsn in demo production reporting; do echo "Testing $dsn..." curl -s -o /dev/null -w "$dsn: %{http_code}\n" \ "http://localhost:5000/api/v1/dsn/$dsn/tables" done **PowerShell version:** $dsns = @("demo", "production", "reporting") foreach ($dsn in $dsns) { $result = Invoke-WebRequest -Uri "http://localhost:5000/api/v1/dsn/$dsn/tables" -UseBasicParsing Write-Host "$dsn : $($result.StatusCode)" } ---- ===== 4. Measure Response Time ===== # Single query curl -s -o /dev/null -w "Time: %{time_total}s\n" \ "http://localhost:5000/api/v1/dsn/demo/tables/Products?\$top=10" # Multiple runs for i in {1..5}; do curl -s -o /dev/null -w "%{time_total}\n" \ "http://localhost:5000/api/v1/dsn/demo/tables/Products?\$top=10" done | awk '{sum+=$1} END {print "Average: " sum/NR "s"}' ---- ===== 5. Checklist ===== | # | Check | Expected | Done | |---|-------|----------|------| | 1 | /health | 200 + "Healthy" | [ ] | | 2 | /swagger | 200 | [ ] | | 3 | DSN "demo" reachable | 200 | [ ] | | 4 | Response time | < 1s | [ ] | | 5 | No errors in logs | No ERROR | [ ] | ---- ===== Automated Health Check ===== **Cron (Linux):** # /etc/cron.d/gateway-health */5 * * * * root curl -sf http://localhost:5000/health || systemctl restart data-gateway **Scheduled Task (Windows):** # health-check.ps1 $response = Invoke-WebRequest -Uri "http://localhost:5000/health" -UseBasicParsing -TimeoutSec 5 if ($response.StatusCode -ne 200) { Restart-Service -Name "DataGateway" -Force Send-MailMessage -To "admin@example.com" -Subject "Gateway Restart" -Body "Gateway was automatically restarted" } ---- ===== Troubleshooting ===== | Problem | Cause | Solution | |---------|-------|----------| | ''Connection refused'' | Gateway not started | Start server | | ''503 Service Unavailable'' | Startup not finished | Wait 30s, retry | | ''500 Internal Server Error'' | Config error | Check logs | | Timeout | Gateway overloaded | Reduce load, increase resources | ---- ===== Thresholds ===== | Metric | Green | Yellow | Red | |--------|-------|--------|-----| | Response time | < 500ms | 500ms-2s | > 2s | | Error rate | < 1% | 1-5% | > 5% | | CPU | < 50% | 50-80% | > 80% | | Memory | < 70% | 70-90% | > 90% | ---- ===== Related Runbooks ===== * [[.:server-starten|Start Server]] - On failure * [[.:logs-pruefen|Check Logs]] - Error analysis * [[..:monitoring:prometheus|Prometheus]] - Automatic monitoring ---- << [[.:dsn-verwalten|<- Manage DSN]] | [[.:logs-pruefen|-> Check Logs]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional// {{tag>operator runbook health check monitoring}}