====== Runbook: Health Check ====== **Duration:** ~5 minutes \\ **Role:** PKI Operator \\ **Frequency:** Daily (recommended: morning) ---- ===== Workflow ===== flowchart LR subgraph CHECKS["CHECKS"] C1[CA Availability] C2[CRL Validity] C3[OCSP Status] C4[Expiring Certificates] C5[Disk Space] end subgraph STATUS["STATUS"] S1{OK?} S2[Dashboard] end subgraph ACTION["ACTION"] A1[Create ticket] A2[Escalate] end C1 --> S1 C2 --> S1 C3 --> S1 C4 --> S1 C5 --> S1 S1 -->|Yes| S2 S1 -->|No| A1 A1 -->|Critical| A2 style S1 fill:#fff3e0 style A2 fill:#ffebee ---- ===== Quick Check (1 Command) ===== #!/bin/bash # pki-health-check.sh - Daily PKI check echo "=== PKI Health Check $(date -Iseconds) ===" # 1. Check CA certificates echo -e "\n[1] CA Certificates:" for ca in /etc/pki/CA/ca-*.pem; do days=$(( ($(openssl x509 -enddate -noout -in "$ca" | cut -d= -f2 | date -f - +%s) - $(date +%s)) / 86400 )) status="OK" [ "$days" -lt 365 ] && status="WARNING" [ "$days" -lt 90 ] && status="CRITICAL" echo " $(basename $ca): $days days [$status]" done # 2. CRL validity echo -e "\n[2] CRL Status:" for crl in /var/www/pki/*.crl; do next=$(openssl crl -in "$crl" -nextupdate -noout 2>/dev/null | cut -d= -f2) if [ -n "$next" ]; then days=$(( ($(date -d "$next" +%s) - $(date +%s)) / 86400 )) status="OK" [ "$days" -lt 3 ] && status="WARNING" [ "$days" -lt 1 ] && status="CRITICAL" echo " $(basename $crl): $days days until update [$status]" fi done # 3. OCSP Responder echo -e "\n[3] OCSP Responder:" ocsp_status=$(curl -s -o /dev/null -w "%{http_code}" http://ocsp.example.com/status) [ "$ocsp_status" = "200" ] && echo " Status: OK" || echo " Status: ERROR ($ocsp_status)" # 4. Expiring certificates (30 days) echo -e "\n[4] Expiring Certificates (<30 days):" count=$(find /etc/ssl/certs -name "*.pem" -exec openssl x509 -checkend 2592000 -noout -in {} \; 2>/dev/null | grep -c "will expire") echo " Count: $count" # 5. Disk space echo -e "\n[5] Disk Space:" df -h /etc/pki /var/log | tail -n +2 echo -e "\n=== End Health Check ===" ---- ===== Detailed Checks ===== ==== 1. CA Certificates ==== # Root CA validity openssl x509 -in /etc/pki/CA/root-ca.pem -enddate -noout # Expected result: > 10 years # Intermediate CA validity openssl x509 -in /etc/pki/CA/intermediate-ca.pem -enddate -noout # Expected result: > 2 years # Verify certificate chain openssl verify -CAfile /etc/pki/CA/root-ca.pem /etc/pki/CA/intermediate-ca.pem # Expected result: OK **Thresholds:** | CA Type | Warning | Critical | |---------|---------|----------| | Root CA | < 5 years | < 2 years | | Intermediate CA | < 1 year | < 6 months | ---- ==== 2. CRL Validity ==== # CRL metadata openssl crl -in /var/www/pki/crl.pem -text -noout | head -20 # Check Next Update openssl crl -in /var/www/pki/crl.pem -nextupdate -noout # Fetch CRL from CDP curl -s http://crl.example.com/crl.der | openssl crl -inform DER -text -noout **Thresholds:** | Metric | Warning | Critical | |--------|---------|----------| | Next Update | < 3 days | < 1 day | | CRL Size | > 10 MB | > 50 MB | ---- ==== 3. OCSP Responder ==== # OCSP availability curl -s -o /dev/null -w "HTTP: %{http_code}, Time: %{time_total}s\n" http://ocsp.example.com/status # OCSP response for test certificate openssl ocsp \ -issuer /etc/pki/CA/intermediate-ca.pem \ -cert /etc/ssl/certs/test.pem \ -url http://ocsp.example.com \ -resp_text # Expected result: "Cert Status: good" (or "revoked" if revoked) **Thresholds:** | Metric | Warning | Critical | |--------|---------|----------| | Response time | > 500ms | > 2s | | HTTP status | != 200 | Timeout | ---- ==== 4. Expiring Certificates ==== # Server certificates (30 days) find /etc/ssl/certs -name "*.pem" -exec sh -c ' if openssl x509 -checkend 2592000 -noout -in "$1" 2>/dev/null | grep -q "will expire"; then echo "EXPIRING: $1" fi ' _ {} \; # CA certificates (1 year) for ca in /etc/pki/CA/*.pem; do if openssl x509 -checkend 31536000 -noout -in "$ca" 2>/dev/null | grep -q "will expire"; then echo "CA WARNING: $ca" fi done # PowerShell: Expiring certificates Get-ChildItem Cert:\LocalMachine\My | Where-Object { $_.NotAfter -lt (Get-Date).AddDays(30) } | Format-Table Subject, NotAfter, Thumbprint -AutoSize ---- ==== 5. System Resources ==== # Disk space for PKI directories df -h /etc/pki /var/log/pki /var/www/pki # Log file sizes du -sh /var/log/pki/* # Processes ps aux | grep -E "(ocsp|openssl)" **Thresholds:** | Resource | Warning | Critical | |----------|---------|----------| | Disk usage | > 80% | > 95% | | Log size | > 1 GB | > 5 GB | ---- ===== Automation ===== ==== Cron Job ==== # /etc/cron.d/pki-health-check # Daily at 08:00 0 8 * * * root /usr/local/bin/pki-health-check.sh | mail -s "PKI Health Check $(date +%Y-%m-%d)" pki-team@example.com ==== Prometheus Exporter (Optional) ==== # prometheus-pki-exporter.yml - job_name: 'pki' static_configs: - targets: ['pki-server:9115'] metrics_path: /probe params: module: [certificate] ---- ===== Dashboard Template ===== | Metric | Status | Value | Threshold | |--------|--------|-------|-----------| | Root CA validity | OK | 15 years | > 5 years | | Intermediate CA validity | WARN | 8 months | > 1 year | | CRL Next Update | OK | 5 days | > 3 days | | OCSP Response | OK | 120ms | < 500ms | | Expiring certificates | WARN | 3 | 0 | | Disk /etc/pki | OK | 45% | < 80% | **Legend:** * OK - Green * WARN - Yellow * CRITICAL - Red ---- ===== Escalation Matrix ===== | Finding | Action | Timeframe | |---------|--------|-----------| | CA < 6 months | Plan CA renewal | 1 week | | CRL expired | Renew CRL immediately | 1 hour | | OCSP unreachable | Restart responder | 30 min | | > 10 expiring certificates | Renewal sprint | 1 day | | Disk > 95% | Rotate/delete logs | Immediately | ---- ===== Related Runbooks ===== * [[.:zertifikat-erneuern|Renew Certificate]] - For expiring certificates * [[..:monitoring:ablauf-monitoring|Expiry Monitoring]] - Automated monitoring * [[..:monitoring:alerting-setup|Alerting Setup]] - Notifications ---- << [[.:zertifikat-widerrufen|<- Revoke Certificate]] | [[..:start|-> Operator Scenarios]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional// {{tag>runbook health-check monitoring operator}}