====== Monitoring & Alerting ====== **Priority 2** - Critical for production operation \\ **Target audience:** DevOps, SRE, IT-Operations Monitoring of PKI infrastructure with focus on certificate expiry, availability, and compliance. ---- ===== Overview ===== flowchart TB subgraph COLLECT["DATA COLLECTION"] C1[Prometheus Exporter] C2[cert-checker] C3[API Polling] end subgraph STORE["STORAGE"] S1[Prometheus] S2[InfluxDB] S3[Elasticsearch] end subgraph VISUALIZE["VISUALIZATION"] V1[Grafana] V2[Kibana] end subgraph ALERT["ALERTING"] A1[Alertmanager] A2[PagerDuty] A3[MS Teams] A4[E-Mail] end C1 --> S1 --> V1 C2 --> S1 --> A1 C3 --> S3 --> V2 A1 --> A2 & A3 & A4 style A1 fill:#ffebee style V1 fill:#e8f5e9 ---- ===== Scenarios ===== ^ Scenario ^ Description ^ Tools ^ | [[.:ablauf-monitoring|Expiry Monitoring]] | Monitor certificate expiry | Prometheus, Grafana | | [[.:revocation-check|Revocation Check]] | Check CRL/OCSP availability | curl, OpenSSL | | [[.:audit-logging|Audit Logging]] | Compliance-conformant logging | Syslog, ELK | | [[.:alerting-setup|Alerting Setup]] | Configure notifications | Alertmanager, PagerDuty | ---- ===== Metrics Overview ===== | Metric | Description | Thresholds | |--------|-------------|------------| | ''cert_expiry_days'' | Days until expiry | Warn: 30, Crit: 7 | | ''crl_next_update_days'' | Days until CRL update | Warn: 3, Crit: 1 | | ''ocsp_response_time_ms'' | OCSP response time | Warn: 500, Crit: 2000 | | ''ca_availability'' | CA reachable (0/1) | Crit: 0 | | ''signing_ops_per_hour'' | Signing operations | Info | ---- ===== Quick Start ===== **Minimal setup (5 minutes):** # 1. Install cert-exporter wget https://github.com/enix/cert-exporter/releases/download/v2.0.0/cert-exporter_2.0.0_linux_amd64.tar.gz tar xzf cert-exporter_*.tar.gz ./cert-exporter --kubeconfig="" --files /etc/ssl/certs/*.pem & # 2. Check expiring certificates curl -s localhost:9793/metrics | grep cert_expires_in_seconds ---- ===== Stack Recommendations ===== | Environment | Stack | Description | |-------------|-------|-------------| | Small (<100 Certs) | Script + E-Mail | Cron job with e-mail alerts | | Medium (100-1000) | Prometheus + Grafana | Standard monitoring | | Large (>1000) | ELK + Grafana + PagerDuty | Enterprise stack | | Kubernetes | cert-manager + Prometheus | Native integration | ---- ===== Related Documentation ===== * [[..:tagesgeschaeft:health-check|Health Check]] - Daily check * [[..:automatisierung:start|Automation]] - Auto-renewal * [[en:int:pqcrypt:administrator:betrieb|Operations]] - System maintenance ---- << [[..:start|<- Operator Scenarios]] | [[.:ablauf-monitoring|-> Expiry Monitoring]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional// {{tag>operator monitoring alerting prometheus grafana}}