Inhaltsverzeichnis

Alerting Setup

Složenost: Niska-Srednja
Trajanje: 30-60 minuta postavljanja
Cilj: Proaktivna obavijest kod PKI problema

Konfiguracija alertinga za PKI monitoring s različitim kanalima obavijesti.


Arhitektura

flowchart LR subgraph TRIGGER["🎯 TRIGGER"] T1[Prometheus Alert] T2[Grafana Alert] T3[Custom Script] end subgraph ROUTE["🔀 ROUTING"] R[Alertmanager] end subgraph NOTIFY["📧 OBAVIJEST"] N1[E-Mail] N2[Slack] N3[MS Teams] N4[PagerDuty] N5[OpsGenie] end T1 --> R T2 --> R T3 --> R R --> N1 & N2 & N3 & N4 & N5 style R fill:#fff3e0 style N4 fill:#e8f5e9


Kategorije alertova

Kategorija Primjeri Ozbiljnost Odgovor
———————-———————
Kritično Certifikat istekao, CA nedostupan P1 Odmah
Upozorenje Certifikat < 7 dana, CRL < 24h P2 4h
Info Certifikat < 30 dana P3 Sljedeći radni dan

Prometheus Alertmanager

Instalacija

# Preuzimanje Alertmanagera
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xzf alertmanager-*.tar.gz
sudo mv alertmanager-*/alertmanager /usr/local/bin/
sudo mv alertmanager-*/amtool /usr/local/bin/

Konfiguracija

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'secret'

route:
  receiver: 'default'
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  routes:
    # Kritični PKI alerti → PagerDuty + E-Mail
    - match:
        severity: critical
        job: pki
      receiver: 'pki-critical'
      repeat_interval: 15m
 
    # Upozorenja → E-Mail + Slack
    - match:
        severity: warning
        job: pki
      receiver: 'pki-warning'
      repeat_interval: 4h
 
    # Info → samo Slack
    - match:
        severity: info
        job: pki
      receiver: 'pki-info'
      repeat_interval: 24h

receivers:
  - name: 'default'
    email_configs:
      - to: 'ops@example.com'

  - name: 'pki-critical'
    email_configs:
      - to: 'pki-team@example.com'
        send_resolved: true
    pagerduty_configs:
      - service_key: '<PAGERDUTY_SERVICE_KEY>'
        severity: critical
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-alerts'
        title: '🚨 PKI KRITIČNO: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

  - name: 'pki-warning'
    email_configs:
      - to: 'pki-team@example.com'
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-alerts'
        title: '⚠️ PKI Upozorenje: {{ .GroupLabels.alertname }}'

  - name: 'pki-info'
    slack_configs:
      - api_url: '<SLACK_WEBHOOK_URL>'
        channel: '#pki-info'
        title: 'ℹ️ PKI Info: {{ .GroupLabels.alertname }}'

inhibit_rules:
  # Potisni upozorenja kad je kritično aktivno
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Systemd Service

# /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Alertmanager
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager
Restart=always
 
[Install]
WantedBy=multi-user.target

Microsoft Teams

# Alertmanager Teams Webhook
receivers:
  - name: 'pki-teams'
    webhook_configs:
      - url: 'https://outlook.office.com/webhook/...'
        send_resolved: true
        http_config:
          bearer_token: ''

Teams Message Card Template:

{
  "@type": "MessageCard",
  "@context": "http://schema.org/extensions",
  "themeColor": "{{ if eq .Status \"firing\" }}FF0000{{ else }}00FF00{{ end }}",
  "summary": "PKI Alert: {{ .GroupLabels.alertname }}",
  "sections": [{
    "activityTitle": "{{ .GroupLabels.alertname }}",
    "activitySubtitle": "{{ .Status | toUpper }}",
    "facts": [
      {{ range .Alerts }}
      {
        "name": "{{ .Labels.instance }}",
        "value": "{{ .Annotations.summary }}"
      },
      {{ end }}
    ],
    "markdown": true
  }],
  "potentialAction": [{
    "@type": "OpenUri",
    "name": "Otvori Runbook",
    "targets": [{
      "os": "default",
      "uri": "{{ (index .Alerts 0).Annotations.runbook_url }}"
    }]
  }]
}

Slack

# Alertmanager Slack konfiguracija
receivers:
  - name: 'pki-slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
        channel: '#pki-alerts'
        username: 'PKI-Alertmanager'
        icon_emoji: ':lock:'
        send_resolved: true
        title: '{{ template "slack.title" . }}'
        text: '{{ template "slack.text" . }}'
        actions:
          - type: button
            text: 'Runbook'
            url: '{{ (index .Alerts 0).Annotations.runbook_url }}'
          - type: button
            text: 'Dashboard'
            url: 'https://grafana.example.com/d/pki'

PagerDuty

# Alertmanager PagerDuty integracija
receivers:
  - name: 'pki-pagerduty'
    pagerduty_configs:
      - service_key: '<INTEGRATION_KEY>'
        severity: '{{ if eq .GroupLabels.severity "critical" }}critical{{ else }}warning{{ end }}'
        description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'
        details:
          firing: '{{ template "pagerduty.firing" . }}'
          num_firing: '{{ .Alerts.Firing | len }}'
          num_resolved: '{{ .Alerts.Resolved | len }}'

E-Mail predlošci

# /etc/alertmanager/templates/email.tmpl
{{ define "email.subject" }}
[{{ .Status | toUpper }}] PKI Alert: {{ .GroupLabels.alertname }}
{{ end }}
 
{{ define "email.html" }}
<!DOCTYPE html>
<html>
<head>
  <style>
    .critical { background-color: #ffebee; border-left: 4px solid #f44336; }
    .warning { background-color: #fff3e0; border-left: 4px solid #ff9800; }
    .resolved { background-color: #e8f5e9; border-left: 4px solid #4caf50; }
  </style>
</head>
<body>
  <h2>PKI Alert: {{ .GroupLabels.alertname }}</h2>
  <p>Status: <strong>{{ .Status | toUpper }}</strong></p>

  {{ range .Alerts }}
  <div class="{{ .Labels.severity }}">
    <h3>{{ .Labels.instance }}</h3>
    <p><strong>Sažetak:</strong> {{ .Annotations.summary }}</p>
    <p><strong>Opis:</strong> {{ .Annotations.description }}</p>
    {{ if .Annotations.runbook_url }}
    <p><a href="{{ .Annotations.runbook_url }}">📖 Otvori Runbook</a></p>
    {{ end }}
  </div>
  {{ end }}
 
  <hr>
  <p>
    <a href="https://grafana.example.com/d/pki">📊 Dashboard</a> |
    <a href="https://alertmanager.example.com">🔔 Alertmanager</a>
  </p>
</body>
</html>
{{ end }}

Alert Rules s Runbook linkovima

# /etc/prometheus/rules/pki-alerts.yml
groups:
  - name: pki-alerts
    rules:
      - alert: CertificateExpiringSoon
        expr: x509_cert_not_after - time() < 7 * 86400
        for: 1h
        labels:
          severity: warning
          team: pki
        annotations:
          summary: "Certifikat {{ $labels.filepath }} ističe za < 7 dana"
          description: "Preostalo vrijeme: {{ $value | humanizeDuration }}"
          runbook_url: "https://wiki.example.com/pki/runbook/obnova-certifikata"

      - alert: CertificateExpired
        expr: x509_cert_not_after - time() < 0
        labels:
          severity: critical
          team: pki
        annotations:
          summary: "KRITIČNO: Certifikat {{ $labels.filepath }} je ISTEKAO"
          runbook_url: "https://wiki.example.com/pki/runbook/izdavanje-certifikata"

      - alert: CANotReachable
        expr: up{job="ca"} == 0
        for: 2m
        labels:
          severity: critical
          team: pki
        annotations:
          summary: "CA server nije dostupan"
          runbook_url: "https://wiki.example.com/pki/runbook/ca-troubleshooting"

Grafana Alerting (Alternativa)

# Grafana Alert Rule (UI ili Provisioning)
apiVersion: 1
groups:
  - orgId: 1
    name: PKI Alerts
    folder: PKI
    interval: 1m
    rules:
      - uid: cert-expiry-warning
        title: Certificate Expiring Soon
        condition: B
        data:
          - refId: A
            relativeTimeRange:
              from: 600
              to: 0
            datasourceUid: prometheus
            model:
              expr: x509_cert_not_after - time() < 7 * 86400
          - refId: B
            datasourceUid: '-100'
            model:
              conditions:
                - evaluator:
                    params: [0]
                    type: gt
                  operator:
                    type: and
                  query:
                    params: [A]
                  reducer:
                    type: count
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: Certifikat uskoro ističe

Test i validacija

# Provjera Alertmanager konfiguracije
amtool check-config /etc/alertmanager/alertmanager.yml
 
# Slanje test-alerta
amtool alert add alertname=TestAlert severity=warning instance=test \
    --alertmanager.url=http://localhost:9093
 
# Prikaz aktivnih alertova
amtool alert --alertmanager.url=http://localhost:9093
 
# Kreiranje silencea (npr. za održavanje)
amtool silence add alertname=CertificateExpiringSoon \
    --alertmanager.url=http://localhost:9093 \
    --comment="Planirano održavanje" \
    --duration=2h

Kontrolna lista

# Točka provjere
—————-
1 Alertmanager instaliran
2 Routing konfiguriran
3 E-Mail Receiver
4 Slack/Teams Webhook
5 PagerDuty integracija
6 Alert Rules definirani
7 Runbook linkovi dodani
8 Test-Alert poslan

Povezana dokumentacija


« ← Audit Logging | → Operator scenariji »


Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional