Inhaltsverzeichnis

Alerting Setup

Komplexität: Niedrig-Mittel
Dauer: 30-60 Minuten Setup
Ziel: Proaktive Benachrichtigung bei PKI-Problemen

Konfiguration von Alerting für PKI-Monitoring mit verschiedenen Notification-Channels.


Architektur

flowchart LR subgraph TRIGGER["🎯 TRIGGER"] T1[Prometheus Alert] T2[Grafana Alert] T3[Custom Script] end subgraph ROUTE["🔀 ROUTING"] R[Alertmanager] end subgraph NOTIFY["📧 BENACHRICHTIGUNG"] N1[E-Mail] N2[Slack] N3[MS Teams] N4[PagerDuty] N5[OpsGenie] end T1 --> R T2 --> R T3 --> R R --> N1 & N2 & N3 & N4 & N5 style R fill:#fff3e0 style N4 fill:#e8f5e9


Alert-Kategorien

Kategorie Beispiele Severity Response
———–———–———-———-
Kritisch Zertifikat abgelaufen, CA down P1 Sofort
Warnung Zertifikat < 7 Tage, CRL < 24h P2 4h
Info Zertifikat < 30 Tage P3 Nächster Werktag

Prometheus Alertmanager

Installation

# Alertmanager herunterladen
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xzf alertmanager-*.tar.gz
sudo mv alertmanager-*/alertmanager /usr/local/bin/
sudo mv alertmanager-*/amtool /usr/local/bin/

Konfiguration

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'secret'

route:
  receiver: 'default'
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  routes:
    # Kritische PKI-Alerts → PagerDuty + E-Mail
    - match:
        severity: critical
        job: pki
      receiver: 'pki-critical'
      repeat_interval: 15m
 
    # Warnungen → E-Mail + Slack
    - match:
        severity: warning
        job: pki
      receiver: 'pki-warning'
      repeat_interval: 4h
 
    # Info → nur Slack
    - match:
        severity: info
        job: pki
      receiver: 'pki-info'
      repeat_interval: 24h

receivers:
  - name: 'default'
    email_configs:
      - to: 'ops@example.com'

  - name: 'pki-critical'
    email_configs:
      - to: 'pki-team@example.com'
        send_resolved: true
    pagerduty_configs:
      - service_key: '<PAGERDUTY_SERVICE_KEY>'
        severity: critical
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-alerts'
        title: '🚨 PKI KRITISCH: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

  - name: 'pki-warning'
    email_configs:
      - to: 'pki-team@example.com'
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-alerts'
        title: '⚠️ PKI Warnung: {{ .GroupLabels.alertname }}'

  - name: 'pki-info'
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-info'
        title: 'ℹ️ PKI Info: {{ .GroupLabels.alertname }}'

inhibit_rules:
  # Unterdrücke Warnungen wenn kritisch aktiv
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Systemd Service

# /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Alertmanager
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager
Restart=always
 
[Install]
WantedBy=multi-user.target

Microsoft Teams

# Alertmanager Teams Webhook
receivers:
  - name: 'pki-teams'
    webhook_configs:
      - url: 'https://outlook.office.com/webhook/...'
        send_resolved: true
        http_config:
          bearer_token: ''

Teams Message Card Template:

{
  "@type": "MessageCard",
  "@context": "http://schema.org/extensions",
  "themeColor": "{{ if eq .Status \"firing\" }}FF0000{{ else }}00FF00{{ end }}",
  "summary": "PKI Alert: {{ .GroupLabels.alertname }}",
  "sections": [{
    "activityTitle": "{{ .GroupLabels.alertname }}",
    "activitySubtitle": "{{ .Status | toUpper }}",
    "facts": [
      {{ range .Alerts }}
      {
        "name": "{{ .Labels.instance }}",
        "value": "{{ .Annotations.summary }}"
      },
      {{ end }}
    ],
    "markdown": true
  }],
  "potentialAction": [{
    "@type": "OpenUri",
    "name": "Runbook öffnen",
    "targets": [{
      "os": "default",
      "uri": "{{ (index .Alerts 0).Annotations.runbook_url }}"
    }]
  }]
}

Slack

# Alertmanager Slack Konfiguration
receivers:
  - name: 'pki-slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
        channel: '#pki-alerts'
        username: 'PKI-Alertmanager'
        icon_emoji: ':lock:'
        send_resolved: true
        title: '{{ template "slack.title" . }}'
        text: '{{ template "slack.text" . }}'
        actions:
          - type: button
            text: 'Runbook'
            url: '{{ (index .Alerts 0).Annotations.runbook_url }}'
          - type: button
            text: 'Dashboard'
            url: 'https://grafana.example.com/d/pki'

PagerDuty

# Alertmanager PagerDuty Integration
receivers:
  - name: 'pki-pagerduty'
    pagerduty_configs:
      - service_key: '<INTEGRATION_KEY>'
        severity: '{{ if eq .GroupLabels.severity "critical" }}critical{{ else }}warning{{ end }}'
        description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'
        details:
          firing: '{{ template "pagerduty.firing" . }}'
          num_firing: '{{ .Alerts.Firing | len }}'
          num_resolved: '{{ .Alerts.Resolved | len }}'

E-Mail Templates

# /etc/alertmanager/templates/email.tmpl
{{ define "email.subject" }}
[{{ .Status | toUpper }}] PKI Alert: {{ .GroupLabels.alertname }}
{{ end }}
 
{{ define "email.html" }}
<!DOCTYPE html>
<html>
<head>
  <style>
    .critical { background-color: #ffebee; border-left: 4px solid #f44336; }
    .warning { background-color: #fff3e0; border-left: 4px solid #ff9800; }
    .resolved { background-color: #e8f5e9; border-left: 4px solid #4caf50; }
  </style>
</head>
<body>
  <h2>PKI Alert: {{ .GroupLabels.alertname }}</h2>
  <p>Status: <strong>{{ .Status | toUpper }}</strong></p>

  {{ range .Alerts }}
  <div class="{{ .Labels.severity }}">
    <h3>{{ .Labels.instance }}</h3>
    <p><strong>Summary:</strong> {{ .Annotations.summary }}</p>
    <p><strong>Description:</strong> {{ .Annotations.description }}</p>
    {{ if .Annotations.runbook_url }}
    <p><a href="{{ .Annotations.runbook_url }}">📖 Runbook öffnen</a></p>
    {{ end }}
  </div>
  {{ end }}
 
  <hr>
  <p>
    <a href="https://grafana.example.com/d/pki">📊 Dashboard</a> |
    <a href="https://alertmanager.example.com">🔔 Alertmanager</a>
  </p>
</body>
</html>
{{ end }}

# /etc/prometheus/rules/pki-alerts.yml
groups:
  - name: pki-alerts
    rules:
      - alert: CertificateExpiringSoon
        expr: x509_cert_not_after - time() < 7 * 86400
        for: 1h
        labels:
          severity: warning
          team: pki
        annotations:
          summary: "Zertifikat {{ $labels.filepath }} läuft in < 7 Tagen ab"
          description: "Verbleibende Zeit: {{ $value | humanizeDuration }}"
          runbook_url: "https://wiki.example.com/pki/runbook/zertifikat-erneuern"

      - alert: CertificateExpired
        expr: x509_cert_not_after - time() < 0
        labels:
          severity: critical
          team: pki
        annotations:
          summary: "KRITISCH: Zertifikat {{ $labels.filepath }} ist ABGELAUFEN"
          runbook_url: "https://wiki.example.com/pki/runbook/zertifikat-ausstellen"

      - alert: CANotReachable
        expr: up{job="ca"} == 0
        for: 2m
        labels:
          severity: critical
          team: pki
        annotations:
          summary: "CA-Server nicht erreichbar"
          runbook_url: "https://wiki.example.com/pki/runbook/ca-troubleshooting"

Grafana Alerting (Alternative)

# Grafana Alert Rule (UI oder Provisioning)
apiVersion: 1
groups:
  - orgId: 1
    name: PKI Alerts
    folder: PKI
    interval: 1m
    rules:
      - uid: cert-expiry-warning
        title: Certificate Expiring Soon
        condition: B
        data:
          - refId: A
            relativeTimeRange:
              from: 600
              to: 0
            datasourceUid: prometheus
            model:
              expr: x509_cert_not_after - time() < 7 * 86400
          - refId: B
            datasourceUid: '-100'
            model:
              conditions:
                - evaluator:
                    params: [0]
                    type: gt
                  operator:
                    type: and
                  query:
                    params: [A]
                  reducer:
                    type: count
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: Zertifikat läuft bald ab

Test & Validierung

# Alertmanager Konfiguration prüfen
amtool check-config /etc/alertmanager/alertmanager.yml
 
# Test-Alert senden
amtool alert add alertname=TestAlert severity=warning instance=test \
    --alertmanager.url=http://localhost:9093
 
# Aktive Alerts anzeigen
amtool alert --alertmanager.url=http://localhost:9093
 
# Silence erstellen (z.B. für Wartung)
amtool silence add alertname=CertificateExpiringSoon \
    --alertmanager.url=http://localhost:9093 \
    --comment="Geplante Wartung" \
    --duration=2h

Checkliste

# Prüfpunkt
———–
1 Alertmanager installiert
2 Routing konfiguriert
3 E-Mail Receiver
4 Slack/Teams Webhook
5 PagerDuty Integration
6 Alert Rules definiert
7 Runbook-Links eingefügt
8 Test-Alert gesendet

Verwandte Dokumentation


« ← Audit-Logging | → Operator-Szenarien »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional