Chapter 9. Probability and Statistics for SLIs and SLOs

You’ve identified some meaningful SLIs, and brought together stakeholders to build thoughtful SLOs from them. Once you collect some data from your system to help you set those targets, that should be it, right? But as you’ve seen, when measuring SLIs, you need to ensure you have data that can allow for multiple analyses and interpretations. The data in and of itself does not tell a complete story: how you analyze it is key to its usefulness. What’s more, systems change rapidly, so the SLOs you set could change as the systems themselves evolve. How do you determine the appropriate SLOs without being able to peer into the future?

This chapter is all about the interpretation of the data you’re collecting. Reliability is expensive, and figuring out the amount of reliability you need is crucial for making the most of your resources. An incorrect analysis doesn’t mean all your hard work has gone to waste, but it does mean you can’t be sure that the results support what you want to accomplish. Misinterpreting the data can mean triggering alerts unnecessarily, or worse, remaining blissfully unaware of underlying problems that will violate your SLOs and lead to customer dissatisfaction.

This chapter is broadly concerned with two difficult problems that arise when implementing SLIs and SLOs:

  • Figuring out what an SLO ought to be

  • Calculating the value of an SLI

The former arises when, for example, ...

Get Implementing Service Level Objectives now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.