Chapter 22. SRE, at Any Size, Is Cultural

Matthew Huxtable

Today’s modern business environments are complex places that move fast with limited resources in pursuit of continually delivering customer value. Maintaining reliable systems is an intricate, detail-oriented task that is difficult to prioritize in this broader context. Traditionally, the effort required to build systems while maintaining production uptime has been little understood, an implicit requirement in the margins, the burden delegated to technical teams.

Leaders make this trade-off at their peril. An understanding of expected reliability and a well-developed risk thermostat are not cutesy optional extras; today, they are first-class requirements. Although engineers and leaders understand this, hierarchies and lack of shared context across an organization are a hazard that prevents development of an integrated approach to building reliable systems.

SRE ushers cultures that recognize these challenges. Through quantitative means, SRE makes explicit the relationship between operational reliability and customer happiness. By prioritizing long-term, objective measures of success, SRE facilitates continual negotiations of reliability whose outcomes are supported by broader organizational objectives. Done well, it emphasizes the importance of humans in continually creating the conditions for success, rather than ...

Get 97 Things Every SRE Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.