Runbook: Health Check

Duration: ~2 minutes
Role: Gateway Operator
Frequency: Daily / Automated

Check Gateway availability and functionality.


Workflow

flowchart TD A[Start Health Check] --> B[/health Endpoint] B --> C{Healthy?} C -->|Yes| D[API Test] C -->|No| E[Check logs] D --> F{Data returned?} F -->|Yes| G[OK - Done] F -->|No| H[Check DSN] E --> I[Restart server] H --> I style G fill:#e8f5e9 style E fill:#ffebee style H fill:#fff3e0


1. Basic Health Check

# Simple Health Check
curl -s -o /dev/null -w "%{http_code}" http://localhost:5000/health
 
# Expected response: 200
# With response body
curl -s http://localhost:5000/health
 
# Expected response: "Healthy"

2. Extended Health Check

# Swagger reachable?
curl -s -o /dev/null -w "%{http_code}" http://localhost:5000/swagger
 
# Check API version
curl -s http://localhost:5000/api/v1/info | jq

3. DSN Connectivity

# Test all DSN
for dsn in demo production reporting; do
  echo "Testing $dsn..."
  curl -s -o /dev/null -w "$dsn: %{http_code}\n" \
    "http://localhost:5000/api/v1/dsn/$dsn/tables"
done

PowerShell version:

$dsns = @("demo", "production", "reporting")
foreach ($dsn in $dsns) {
    $result = Invoke-WebRequest -Uri "http://localhost:5000/api/v1/dsn/$dsn/tables" -UseBasicParsing
    Write-Host "$dsn : $($result.StatusCode)"
}

4. Measure Response Time

# Single query
curl -s -o /dev/null -w "Time: %{time_total}s\n" \
  "http://localhost:5000/api/v1/dsn/demo/tables/Products?\$top=10"
 
# Multiple runs
for i in {1..5}; do
  curl -s -o /dev/null -w "%{time_total}\n" \
    "http://localhost:5000/api/v1/dsn/demo/tables/Products?\$top=10"
done | awk '{sum+=$1} END {print "Average: " sum/NR "s"}'

5. Checklist

# Check Expected Done
——-———-——
1 /health 200 + „Healthy“ [ ]
2 /swagger 200 [ ]
3 DSN „demo“ reachable 200 [ ]
4 Response time < 1s [ ]
5 No errors in logs No ERROR [ ]

Automated Health Check

Cron (Linux):

# /etc/cron.d/gateway-health
*/5 * * * * root curl -sf http://localhost:5000/health || systemctl restart data-gateway

Scheduled Task (Windows):

# health-check.ps1
$response = Invoke-WebRequest -Uri "http://localhost:5000/health" -UseBasicParsing -TimeoutSec 5
if ($response.StatusCode -ne 200) {
    Restart-Service -Name "DataGateway" -Force
    Send-MailMessage -To "admin@example.com" -Subject "Gateway Restart" -Body "Gateway was automatically restarted"
}

Troubleshooting

Problem Cause Solution
—————-———-
Connection refused Gateway not started Start server
503 Service Unavailable Startup not finished Wait 30s, retry
500 Internal Server Error Config error Check logs
Timeout Gateway overloaded Reduce load, increase resources

Thresholds

Metric Green Yellow Red
——–——-——–—–
Response time < 500ms 500ms-2s > 2s
Error rate < 1% 1-5% > 5%
CPU < 50% 50-80% > 80%
Memory < 70% 70-90% > 90%


« <- Manage DSN | -> Check Logs »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Data Gateway Professional

Zuletzt geändert: on 2026/01/29 at 11:32 PM