====== Configurazione alerting ====== **ComplessitΓ :** Bassa-Media \\ **Durata:** 30-60 minuti per il setup \\ **Obiettivo:** Notifica proattiva in caso di problemi PKI Configurazione dell'alerting per il monitoraggio PKI con diversi canali di notifica. ---- ===== Architettura ===== flowchart LR subgraph TRIGGER["🎯 TRIGGER"] T1[Prometheus Alert] T2[Grafana Alert] T3[Custom Script] end subgraph ROUTE["πŸ”€ ROUTING"] R[Alertmanager] end subgraph NOTIFY["πŸ“§ NOTIFICA"] N1[E-Mail] N2[Slack] N3[MS Teams] N4[PagerDuty] N5[OpsGenie] end T1 --> R T2 --> R T3 --> R R --> N1 & N2 & N3 & N4 & N5 style R fill:#fff3e0 style N4 fill:#e8f5e9 ---- ===== Categorie di alert ===== | Categoria | Esempi | Severity | Risposta | |-----------|--------|----------|----------| | **Critico** | Certificato scaduto, CA down | P1 | Immediata | | **Avviso** | Certificato < 7 giorni, CRL < 24h | P2 | 4h | | **Info** | Certificato < 30 giorni | P3 | Prossimo giorno lavorativo | ---- ===== Prometheus Alertmanager ===== ==== Installazione ==== # Scaricare Alertmanager wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz tar xzf alertmanager-*.tar.gz sudo mv alertmanager-*/alertmanager /usr/local/bin/ sudo mv alertmanager-*/amtool /usr/local/bin/ ==== Configurazione ==== # /etc/alertmanager/alertmanager.yml global: resolve_timeout: 5m smtp_smarthost: 'smtp.example.com:587' smtp_from: 'alertmanager@example.com' smtp_auth_username: 'alertmanager' smtp_auth_password: 'secret' route: receiver: 'default' group_by: ['alertname', 'severity'] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: # Alert PKI critici β†’ PagerDuty + E-Mail - match: severity: critical job: pki receiver: 'pki-critical' repeat_interval: 15m # Avvisi β†’ E-Mail + Slack - match: severity: warning job: pki receiver: 'pki-warning' repeat_interval: 4h # Info β†’ solo Slack - match: severity: info job: pki receiver: 'pki-info' repeat_interval: 24h receivers: - name: 'default' email_configs: - to: 'ops@example.com' - name: 'pki-critical' email_configs: - to: 'pki-team@example.com' send_resolved: true pagerduty_configs: - service_key: '' severity: critical slack_configs: - api_url: '' channel: '#pki-alerts' title: '🚨 PKI CRITICO: {{ .GroupLabels.alertname }}' text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' - name: 'pki-warning' email_configs: - to: 'pki-team@example.com' slack_configs: - api_url: '' channel: '#pki-alerts' title: '⚠️ PKI Avviso: {{ .GroupLabels.alertname }}' - name: 'pki-info' slack_configs: - api_url: '' channel: '#pki-info' title: 'ℹ️ PKI Info: {{ .GroupLabels.alertname }}' inhibit_rules: # Sopprimere avvisi quando critico Γ¨ attivo - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname'] ==== Servizio Systemd ==== # /etc/systemd/system/alertmanager.service [Unit] Description=Prometheus Alertmanager After=network.target [Service] Type=simple ExecStart=/usr/local/bin/alertmanager \ --config.file=/etc/alertmanager/alertmanager.yml \ --storage.path=/var/lib/alertmanager Restart=always [Install] WantedBy=multi-user.target ---- ===== Microsoft Teams ===== # Alertmanager Teams Webhook receivers: - name: 'pki-teams' webhook_configs: - url: 'https://outlook.office.com/webhook/...' send_resolved: true http_config: bearer_token: '' **Template Teams Message Card:** { "@type": "MessageCard", "@context": "http://schema.org/extensions", "themeColor": "{{ if eq .Status \"firing\" }}FF0000{{ else }}00FF00{{ end }}", "summary": "PKI Alert: {{ .GroupLabels.alertname }}", "sections": [{ "activityTitle": "{{ .GroupLabels.alertname }}", "activitySubtitle": "{{ .Status | toUpper }}", "facts": [ {{ range .Alerts }} { "name": "{{ .Labels.instance }}", "value": "{{ .Annotations.summary }}" }, {{ end }} ], "markdown": true }], "potentialAction": [{ "@type": "OpenUri", "name": "Apri Runbook", "targets": [{ "os": "default", "uri": "{{ (index .Alerts 0).Annotations.runbook_url }}" }] }] } ---- ===== Slack ===== # Configurazione Alertmanager Slack receivers: - name: 'pki-slack' slack_configs: - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz' channel: '#pki-alerts' username: 'PKI-Alertmanager' icon_emoji: ':lock:' send_resolved: true title: '{{ template "slack.title" . }}' text: '{{ template "slack.text" . }}' actions: - type: button text: 'Runbook' url: '{{ (index .Alerts 0).Annotations.runbook_url }}' - type: button text: 'Dashboard' url: 'https://grafana.example.com/d/pki' ---- ===== PagerDuty ===== # Integrazione Alertmanager PagerDuty receivers: - name: 'pki-pagerduty' pagerduty_configs: - service_key: '' severity: '{{ if eq .GroupLabels.severity "critical" }}critical{{ else }}warning{{ end }}' description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}' details: firing: '{{ template "pagerduty.firing" . }}' num_firing: '{{ .Alerts.Firing | len }}' num_resolved: '{{ .Alerts.Resolved | len }}' ---- ===== Template E-Mail ===== # /etc/alertmanager/templates/email.tmpl {{ define "email.subject" }} [{{ .Status | toUpper }}] PKI Alert: {{ .GroupLabels.alertname }} {{ end }} {{ define "email.html" }}

PKI Alert: {{ .GroupLabels.alertname }}

Stato: {{ .Status | toUpper }}

{{ range .Alerts }}

{{ .Labels.instance }}

Riepilogo: {{ .Annotations.summary }}

Descrizione: {{ .Annotations.description }}

{{ if .Annotations.runbook_url }}

πŸ“– Apri Runbook

{{ end }}
{{ end }}

πŸ“Š Dashboard | πŸ”” Alertmanager

{{ end }}
---- ===== Alert Rules con link al Runbook ===== # /etc/prometheus/rules/pki-alerts.yml groups: - name: pki-alerts rules: - alert: CertificateExpiringSoon expr: x509_cert_not_after - time() < 7 * 86400 for: 1h labels: severity: warning team: pki annotations: summary: "Certificato {{ $labels.filepath }} scade tra < 7 giorni" description: "Tempo rimanente: {{ $value | humanizeDuration }}" runbook_url: "https://wiki.example.com/pki/runbook/rinnovo-certificato" - alert: CertificateExpired expr: x509_cert_not_after - time() < 0 labels: severity: critical team: pki annotations: summary: "CRITICO: Certificato {{ $labels.filepath }} Γ¨ SCADUTO" runbook_url: "https://wiki.example.com/pki/runbook/emissione-certificato" - alert: CANotReachable expr: up{job="ca"} == 0 for: 2m labels: severity: critical team: pki annotations: summary: "Server CA non raggiungibile" runbook_url: "https://wiki.example.com/pki/runbook/ca-troubleshooting" ---- ===== Grafana Alerting (Alternativa) ===== # Grafana Alert Rule (UI o Provisioning) apiVersion: 1 groups: - orgId: 1 name: PKI Alerts folder: PKI interval: 1m rules: - uid: cert-expiry-warning title: Certificate Expiring Soon condition: B data: - refId: A relativeTimeRange: from: 600 to: 0 datasourceUid: prometheus model: expr: x509_cert_not_after - time() < 7 * 86400 - refId: B datasourceUid: '-100' model: conditions: - evaluator: params: [0] type: gt operator: type: and query: params: [A] reducer: type: count for: 1h labels: severity: warning annotations: summary: Certificato in scadenza ---- ===== Test e validazione ===== # Verificare configurazione Alertmanager amtool check-config /etc/alertmanager/alertmanager.yml # Inviare alert di test amtool alert add alertname=TestAlert severity=warning instance=test \ --alertmanager.url=http://localhost:9093 # Visualizzare alert attivi amtool alert --alertmanager.url=http://localhost:9093 # Creare silence (es. per manutenzione) amtool silence add alertname=CertificateExpiringSoon \ --alertmanager.url=http://localhost:9093 \ --comment="Manutenzione programmata" \ --duration=2h ---- ===== Checklist ===== | # | Punto di verifica | βœ“ | |---|-------------------|---| | 1 | Alertmanager installato | ☐ | | 2 | Routing configurato | ☐ | | 3 | Receiver E-Mail | ☐ | | 4 | Webhook Slack/Teams | ☐ | | 5 | Integrazione PagerDuty | ☐ | | 6 | Alert Rules definite | ☐ | | 7 | Link Runbook inseriti | ☐ | | 8 | Alert di test inviato | ☐ | ---- ===== Documentazione correlata ===== * [[.:ablauf-monitoring|Monitoraggio scadenze]] – Raccogliere metriche * [[..:tagesgeschaeft:start|OperativitΓ  quotidiana]] – Runbook * [[.:audit-logging|Audit logging]] – Event logging ---- << [[.:audit-logging|← Audit logging]] | [[..:start|β†’ Scenari per operatori]] >> ---- //Wolfgang van der Stille @ EMSR DATA d.o.o. - Post-Quantum Cryptography Professional// {{tag>alerting prometheus alertmanager slack teams pagerduty operator}}