5

Effective Alerting with Prometheus

Thus far, we’ve looked primarily at how to get data into Prometheus through scrape jobs, discovering scrape targets, and manually querying data. But no monitoring system is truly useful if you need to constantly check if everything is okay; we need some system running in the background evaluating the state of our systems and alerting us if they’re not working correctly. In this chapter, we’ll look at how Prometheus achieves that through a combination of its rule subsystem and the separate Alertmanager component.

We’ll cover the following main topics:

  • Alertmanager configuration and routing
  • Alertmanager templating
  • Highly available (HA) alerting
  • Making robust alerts
  • Unit-testing alerting rules

Let’s get started! ...

Get Mastering Prometheus now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.