Book description
Two previous O'Reilly books from Google--Site Reliability Engineering and The Site Reliability Workbook--demonstrated how and why a commitment to the entire service life cycle enables your organization to successfully build, deploy, monitor, and maintain software systems. In this detailed report, Google Cloud Reliability Advocate Steve McGhee and Google Cloud Solutions Architect James Brookbank dive deeper into the specific challenges engineers face when adopting SRE in their organization.
Despite SRE's popularity, many enterprises have experienced a significant gap between initial enthusiasm for SRE and its often modest level of adoption. If you're a product owner or have a stake in reliable services and need to know more about SRE adoption, this report will methodically guide you through the process.
- Get started by evaluating your existing environment and setting expectations
- Examine SRE's approach to reliability, and learn why reliability is the most desired product feature
- Learn how to map SRE's guiding principles, such as embracing risk, to your existing organization
- Develop a set of SRE practices for your team, based on what team members can do, what they know, and what tools they use
- Learn tips on how to actively nurture success and keep SRE working in your organization
Table of contents
- Preface
- 1. Getting Started with Enterprise SRE
- 2. Why the SRE Approach to Reliability?
-
3. SRE Principles
- Embracing Risk (SRE Book Chapter 3)
- Service-Level Objectives (SRE Book Chapter 4)
- Eliminating Toil (SRE Book Chapter 5)
- Monitoring Distributed Systems (SRE Book Chapter 6)
- The Evolution of Automation at Google (SRE Book Chapter 7)
- Release Engineering (SRE Book Chapter 8)
- Simplicity (SRE Book Chapter 9)
- How Do You Map These Principles to Your Existing Organization?
- Preventing Org-Destroying Mistakes
- Create a Safe-to-Fail Environment for Your Adoption Journey
- Beware Diverging Priorities
- How Do You Get Buy-In to These Principles, with the Critical Sign-Off and Backing You Need?
- 4. SRE Practices
- 5. Actively Nurturing Success
- 6. Not Just Google
- Conclusion
- About the Authors
Product information
- Title: Enterprise Roadmap to SRE
- Author(s):
- Release date: January 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098117733
You might also like
book
Incident Metrics in SRE
Site reliability engineers often use MTTx metrics to evaluate improvements or track trends. But is either …
book
Building Reliable Services on the Cloud
For a product or service to be successful, it must be reliable. Users need to trust …
article
Three Ways to Sell Value in B2B Markets
As customers face pressure to reduce costs while maintaining profitability, value-based selling (VBS) has become critical …
article
Run Llama-2 Models Locally with llama.cpp
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …