====== Monitoring & Alerting ======
**Priority 2** - Critical for production operation \\
**Target audience:** DevOps, SRE, IT-Operations
Monitoring of PKI infrastructure with focus on certificate expiry, availability, and compliance.
----
===== Overview =====
flowchart TB
subgraph COLLECT["DATA COLLECTION"]
C1[Prometheus Exporter]
C2[cert-checker]
C3[API Polling]
end
subgraph STORE["STORAGE"]
S1[Prometheus]
S2[InfluxDB]
S3[Elasticsearch]
end
subgraph VISUALIZE["VISUALIZATION"]
V1[Grafana]
V2[Kibana]
end
subgraph ALERT["ALERTING"]
A1[Alertmanager]
A2[PagerDuty]
A3[MS Teams]
A4[E-Mail]
end
C1 --> S1 --> V1
C2 --> S1 --> A1
C3 --> S3 --> V2
A1 --> A2 & A3 & A4
style A1 fill:#ffebee
style V1 fill:#e8f5e9
----
===== Scenarios =====
^ Scenario ^ Description ^ Tools ^
| [[.:ablauf-monitoring|Expiry Monitoring]] | Monitor certificate expiry | Prometheus, Grafana |
| [[.:revocation-check|Revocation Check]] | Check CRL/OCSP availability | curl, OpenSSL |
| [[.:audit-logging|Audit Logging]] | Compliance-conformant logging | Syslog, ELK |
| [[.:alerting-setup|Alerting Setup]] | Configure notifications | Alertmanager, PagerDuty |
----
===== Metrics Overview =====
| Metric | Description | Thresholds |
|--------|-------------|------------|
| ''cert_expiry_days'' | Days until expiry | Warn: 30, Crit: 7 |
| ''crl_next_update_days'' | Days until CRL update | Warn: 3, Crit: 1 |
| ''ocsp_response_time_ms'' | OCSP response time | Warn: 500, Crit: 2000 |
| ''ca_availability'' | CA reachable (0/1) | Crit: 0 |
| ''signing_ops_per_hour'' | Signing operations | Info |
----
===== Quick Start =====
**Minimal setup (5 minutes):**
# 1. Install cert-exporter
wget https://github.com/enix/cert-exporter/releases/download/v2.0.0/cert-exporter_2.0.0_linux_amd64.tar.gz
tar xzf cert-exporter_*.tar.gz
./cert-exporter --kubeconfig="" --files /etc/ssl/certs/*.pem &
# 2. Check expiring certificates
curl -s localhost:9793/metrics | grep cert_expires_in_seconds
----
===== Stack Recommendations =====
| Environment | Stack | Description |
|-------------|-------|-------------|
| Small (<100 Certs) | Script + E-Mail | Cron job with e-mail alerts |
| Medium (100-1000) | Prometheus + Grafana | Standard monitoring |
| Large (>1000) | ELK + Grafana + PagerDuty | Enterprise stack |
| Kubernetes | cert-manager + Prometheus | Native integration |
----
===== Related Documentation =====
* [[..:tagesgeschaeft:health-check|Health Check]] - Daily check
* [[..:automatisierung:start|Automation]] - Auto-renewal
* [[en:int:pqcrypt:administrator:betrieb|Operations]] - System maintenance
----
<< [[..:start|<- Operator Scenarios]] | [[.:ablauf-monitoring|-> Expiry Monitoring]] >>
----
//Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional//
{{tag>operator monitoring alerting prometheus grafana}}