Prometheus Alert Manager

The Prometheus Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

To ensure that alert messages get delivered it's a good idea to run several alertmanager processes clustered together, so in this example we will deploy 2 on different machines

These instructions assume you already have a running Prometheus server - see here for an example.

Installing Alert Manager

Initially this will be done on the existing Prometheus server, these instructions can also be downloaded as a install script

Set the required versions

1export VERSION=0.25.0
2export ARCH=arm64

and create a user to run the process under

1useradd -M -r -s /bin/false alertmanager

Download the binaries and install it in /usr/local/bin

1wget https://github.com/prometheus/alertmanager/releases/download/v$VERSION/alertmanager-$VERSION.linux-$ARCH.tar.gz
2tar xvfz alertmanager-$VERSION.linux-$ARCH.tar.gz
3
4cp alertmanager-$VERSION.linux-$ARCH/{alertmanager,amtool} /usr/local/bin/
5chown alertmanager:alertmanager /usr/local/bin/{alertmanager,amtool}

Copy the sample config files and create the config file for amtool

 1mkdir -p /etc/alertmanager
 2cp alertmanager-$VERSION.linux-$ARCH/alertmanager.yml /etc/alertmanager
 3chown -R alertmanager:alertmanager /etc/alertmanager
 4mkdir -p /var/lib/alertmanager
 5chown alertmanager:alertmanager /var/lib/alertmanager
 6mkdir -p /etc/amtool
 7
 8cat > /etc/amtool/config.yml <<EOF
 9alertmanager.url: http://localhost:9093
10EOF

Create and start the systemd process to run alertmanager

 1
 2cat > /etc/systemd/system/alertmanager.service <<EOF
 3[Unit]
 4Description=Prometheus Alertmanager
 5Wants=network-online.target
 6After=network-online.target
 7[Service]
 8User=alertmanager
 9Group=alertmanager
10Type=simple
11ExecStart=/usr/local/bin/alertmanager \
12 --config.file /etc/alertmanager/alertmanager.yml \
13 --storage.path /var/lib/alertmanager/
14[Install]
15WantedBy=multi-user.target
16EOF
17
18systemctl enable alertmanager
19systemctl start alertmanager

Use amtool to check the configuration

 1neilarmitage@prom1:~/prometheus$ amtool config show
 2global:
 3  resolve_timeout: 5m
 4  http_config:
 5    follow_redirects: true
 6    enable_http2: true
 7  smtp_hello: localhost
 8  smtp_require_tls: true
 9  pagerduty_url: https://events.pagerduty.com/v2/enqueue
10  opsgenie_api_url: https://api.opsgenie.com/
11  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
12  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
13  telegram_api_url: https://api.telegram.org
14  webex_api_url: https://webexapis.com/v1/messages
15route:
16  receiver: web.hook
17  group_by:
18  - alertname
19  continue: false
20  group_wait: 30s
21  group_interval: 5m
22  repeat_interval: 1h
23inhibit_rules:
24- source_match:
25    severity: critical
26  target_match:
27    severity: warning
28  equal:
29  - alertname
30  - dev
31  - instance
32receivers:
33- name: web.hook
34  webhook_configs:
35  - send_resolved: true
36    http_config:
37      follow_redirects: true
38      enable_http2: true
39    url: http://127.0.0.1:5001/
40    max_alerts: 0
41templates: []

Now Prometheus needs to know about the alertmanager so edit vi /etc/prometheus/prometheus.yml and add

1alerting:
2 alertmanagers:
3 - static_configs:
4 - targets: ["localhost:9093"]

and then restart Prometheus sudo systemctl restart prometheus

The new alertmanager should now show up under the status tab in Prometheus

Alert Manager

Adding a second Alert Manager Instance

To ensure Alerts get delivered a second instance of Alert Manager is recommended. The setup of the Alert Manager on a separate host is the same as the initial setup with the exception of a addition command option on the startup of Alert Manager so it knows where the other host is --cluster.peer=

 1[Unit]
 2Description=Prometheus Alertmanager
 3Wants=network-online.target
 4After=network-online.target
 5
 6[Service]
 7User=alertmanager
 8Group=alertmanager
 9Type=simple
10ExecStart=/usr/local/bin/alertmanager \
11 --config.file /etc/alertmanager/alertmanager.yml \
12 --storage.path /var/lib/alertmanager/ \
13 --cluster.peer=<IP of first alertmanager>:9094
14
15[Install]
16WantedBy=multi-user.target

Once the second instance is running update the Systemd file on the first alert manager to add the --cluster.peer=<ip of 2nd host>:9094 The reload the config and restart the service

1sudo systemctl daemon-reload
2sudo systemctl restart alertsudomanager

Now Prometheus needs to know about the second alertmanager so edit vi /etc/prometheus/prometheus.yml and add

1alerting:
2 alertmanagers:
3 - static_configs:
4 - targets: ["localhost:9093","IP of 2nd alertmanager:9093]

and then restart Prometheus sudo systemctl restart prometheus

The new alertmanager should now show up under the status tab in Prometheus

Alert Manager

Writing Alerts

First of all edit sudo vi /etc/prometheus/prometheus.yml and update the rules_files option to add a rules directory, this will allow us to create multiple files in this directory without having to further update the config.

1rule_files:
2- "/etc/prometheus/rules/*.yml"

Create the rules file directory

1sudo mkdir /etc/prometheus/rules/
2sudo chown prometheus:prometheus /etc/prometheus/rules/

Now we can create a test rule sudo vi /etc/prometheus/rules/test.yml

1groups:
2- name: test-server
3  rules:
4  - alert: TestServerDown
5    expr: up{job="test server"}==0
6    labels:
7      severity: critical
8    annotations:
9      summary: Test Server Down

Now restart Prometheus to pick up the change sudo systemctl restart prometheus

The Alert should now show on the UI as Inactive

Alert Inactive

If you stop the node_exporter on the test server sudo systemctl stop node_exporter and wait a minute the alert should show as active

Alert Active

comments powered by Disqus