====== Alerting Setup ====== **Complexity:** Low-Medium \\ **Duration:** 30-60 minutes setup \\ **Goal:** Proactive notification on PKI problems Configuration of alerting for PKI monitoring with various notification channels. ---- ===== Architecture ===== flowchart LR subgraph TRIGGER["TRIGGER"] T1[Prometheus Alert] T2[Grafana Alert] T3[Custom Script] end subgraph ROUTE["ROUTING"] R[Alertmanager] end subgraph NOTIFY["NOTIFICATION"] N1[E-Mail] N2[Slack] N3[MS Teams] N4[PagerDuty] N5[OpsGenie] end T1 --> R T2 --> R T3 --> R R --> N1 & N2 & N3 & N4 & N5 style R fill:#fff3e0 style N4 fill:#e8f5e9 ---- ===== Alert Categories ===== | Category | Examples | Severity | Response | |----------|----------|----------|----------| | **Critical** | Certificate expired, CA down | P1 | Immediate | | **Warning** | Certificate < 7 days, CRL < 24h | P2 | 4h | | **Info** | Certificate < 30 days | P3 | Next business day | ---- ===== Prometheus Alertmanager ===== ==== Installation ==== # Download Alertmanager wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz tar xzf alertmanager-*.tar.gz sudo mv alertmanager-*/alertmanager /usr/local/bin/ sudo mv alertmanager-*/amtool /usr/local/bin/ ==== Configuration ==== # /etc/alertmanager/alertmanager.yml global: resolve_timeout: 5m smtp_smarthost: 'smtp.example.com:587' smtp_from: 'alertmanager@example.com' smtp_auth_username: 'alertmanager' smtp_auth_password: 'secret' route: receiver: 'default' group_by: ['alertname', 'severity'] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: # Critical PKI alerts -> PagerDuty + E-Mail - match: severity: critical job: pki receiver: 'pki-critical' repeat_interval: 15m # Warnings -> E-Mail + Slack - match: severity: warning job: pki receiver: 'pki-warning' repeat_interval: 4h # Info -> Slack only - match: severity: info job: pki receiver: 'pki-info' repeat_interval: 24h receivers: - name: 'default' email_configs: - to: 'ops@example.com' - name: 'pki-critical' email_configs: - to: 'pki-team@example.com' send_resolved: true pagerduty_configs: - service_key: '' severity: critical slack_configs: - api_url: '' channel: '#pki-alerts' title: 'PKI CRITICAL: {{ .GroupLabels.alertname }}' text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' - name: 'pki-warning' email_configs: - to: 'pki-team@example.com' slack_configs: - api_url: '' channel: '#pki-alerts' title: 'PKI Warning: {{ .GroupLabels.alertname }}' - name: 'pki-info' slack_configs: - api_url: '' channel: '#pki-info' title: 'PKI Info: {{ .GroupLabels.alertname }}' inhibit_rules: # Suppress warnings when critical is active - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname'] ==== Systemd Service ==== # /etc/systemd/system/alertmanager.service [Unit] Description=Prometheus Alertmanager After=network.target [Service] Type=simple ExecStart=/usr/local/bin/alertmanager \ --config.file=/etc/alertmanager/alertmanager.yml \ --storage.path=/var/lib/alertmanager Restart=always [Install] WantedBy=multi-user.target ---- ===== Microsoft Teams ===== # Alertmanager Teams Webhook receivers: - name: 'pki-teams' webhook_configs: - url: 'https://outlook.office.com/webhook/...' send_resolved: true http_config: bearer_token: '' **Teams Message Card Template:** { "@type": "MessageCard", "@context": "http://schema.org/extensions", "themeColor": "{{ if eq .Status \"firing\" }}FF0000{{ else }}00FF00{{ end }}", "summary": "PKI Alert: {{ .GroupLabels.alertname }}", "sections": [{ "activityTitle": "{{ .GroupLabels.alertname }}", "activitySubtitle": "{{ .Status | toUpper }}", "facts": [ {{ range .Alerts }} { "name": "{{ .Labels.instance }}", "value": "{{ .Annotations.summary }}" }, {{ end }} ], "markdown": true }], "potentialAction": [{ "@type": "OpenUri", "name": "Open Runbook", "targets": [{ "os": "default", "uri": "{{ (index .Alerts 0).Annotations.runbook_url }}" }] }] } ---- ===== Slack ===== # Alertmanager Slack Configuration receivers: - name: 'pki-slack' slack_configs: - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz' channel: '#pki-alerts' username: 'PKI-Alertmanager' icon_emoji: ':lock:' send_resolved: true title: '{{ template "slack.title" . }}' text: '{{ template "slack.text" . }}' actions: - type: button text: 'Runbook' url: '{{ (index .Alerts 0).Annotations.runbook_url }}' - type: button text: 'Dashboard' url: 'https://grafana.example.com/d/pki' ---- ===== PagerDuty ===== # Alertmanager PagerDuty Integration receivers: - name: 'pki-pagerduty' pagerduty_configs: - service_key: '' severity: '{{ if eq .GroupLabels.severity "critical" }}critical{{ else }}warning{{ end }}' description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}' details: firing: '{{ template "pagerduty.firing" . }}' num_firing: '{{ .Alerts.Firing | len }}' num_resolved: '{{ .Alerts.Resolved | len }}' ---- ===== E-Mail Templates ===== # /etc/alertmanager/templates/email.tmpl {{ define "email.subject" }} [{{ .Status | toUpper }}] PKI Alert: {{ .GroupLabels.alertname }} {{ end }} {{ define "email.html" }}

PKI Alert: {{ .GroupLabels.alertname }}

Status: {{ .Status | toUpper }}

{{ range .Alerts }}

{{ .Labels.instance }}

Summary: {{ .Annotations.summary }}

Description: {{ .Annotations.description }}

{{ if .Annotations.runbook_url }}

Open Runbook

{{ end }}
{{ end }}

Dashboard | Alertmanager

{{ end }}
---- ===== Alert Rules with Runbook Links ===== # /etc/prometheus/rules/pki-alerts.yml groups: - name: pki-alerts rules: - alert: CertificateExpiringSoon expr: x509_cert_not_after - time() < 7 * 86400 for: 1h labels: severity: warning team: pki annotations: summary: "Certificate {{ $labels.filepath }} expires in < 7 days" description: "Time remaining: {{ $value | humanizeDuration }}" runbook_url: "https://wiki.example.com/pki/runbook/renew-certificate" - alert: CertificateExpired expr: x509_cert_not_after - time() < 0 labels: severity: critical team: pki annotations: summary: "CRITICAL: Certificate {{ $labels.filepath }} has EXPIRED" runbook_url: "https://wiki.example.com/pki/runbook/issue-certificate" - alert: CANotReachable expr: up{job="ca"} == 0 for: 2m labels: severity: critical team: pki annotations: summary: "CA server not reachable" runbook_url: "https://wiki.example.com/pki/runbook/ca-troubleshooting" ---- ===== Grafana Alerting (Alternative) ===== # Grafana Alert Rule (UI or Provisioning) apiVersion: 1 groups: - orgId: 1 name: PKI Alerts folder: PKI interval: 1m rules: - uid: cert-expiry-warning title: Certificate Expiring Soon condition: B data: - refId: A relativeTimeRange: from: 600 to: 0 datasourceUid: prometheus model: expr: x509_cert_not_after - time() < 7 * 86400 - refId: B datasourceUid: '-100' model: conditions: - evaluator: params: [0] type: gt operator: type: and query: params: [A] reducer: type: count for: 1h labels: severity: warning annotations: summary: Certificate expiring soon ---- ===== Test & Validation ===== # Check Alertmanager configuration amtool check-config /etc/alertmanager/alertmanager.yml # Send test alert amtool alert add alertname=TestAlert severity=warning instance=test \ --alertmanager.url=http://localhost:9093 # Show active alerts amtool alert --alertmanager.url=http://localhost:9093 # Create silence (e.g., for maintenance) amtool silence add alertname=CertificateExpiringSoon \ --alertmanager.url=http://localhost:9093 \ --comment="Planned maintenance" \ --duration=2h ---- ===== Checklist ===== | # | Checkpoint | Done | |---|------------|------| | 1 | Alertmanager installed | | | 2 | Routing configured | | | 3 | E-Mail receiver | | | 4 | Slack/Teams webhook | | | 5 | PagerDuty integration | | | 6 | Alert rules defined | | | 7 | Runbook links inserted | | | 8 | Test alert sent | | ---- ===== Related Documentation ===== * [[.:ablauf-monitoring|Expiry Monitoring]] - Collect metrics * [[..:tagesgeschaeft:start|Daily Operations]] - Runbooks * [[.:audit-logging|Audit Logging]] - Event logging ---- << [[.:audit-logging|<- Audit Logging]] | [[..:start|-> Operator Scenarios]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional// {{tag>alerting prometheus alertmanager slack teams pagerduty operator}}