Book description
Improve Your Service Scalability and Reliability with SRE
“The techniques and principles of SRE are not only clearly defined here, but also the rationale behind them is explained in a way that will stick. This is not some dry definition, this is practical, usable understanding. . . . I can whole-heartedly recommend this book without any reservation. This is a very good book on an important topic that helps to move the game forward for our discipline!”
–From the Foreword by David Farley, Founder and CEO of Continuous Delivery Ltd.
Pioneered by Google to create more scalable and reliable large-scale systems, Site Reliability Engineering (SRE) has become one of today’s most valuable software innovation opportunities. Establishing SRE Foundations is a concise, practical guide that shows how to drive successful SRE adoption in your own organization. Dr. Vladyslav Ukis presents a step-by-step approach to establishing the right cultural, organizational, and technical process foundations, quickly achieving a "minimum viable SRE" and continually improving from there.
Dr. Ukis draws extensively on his own experiences leading an SRE transformation journey at a major healthcare company. Throughout, he answers specific questions that organizations ask about SRE, identifies pitfalls, and shows how to avoid or overcome them. Whatever your role in software development, engineering, or operations, this guide will help you apply SRE to improve what matters most: user and customer experience.
Understand how SRE works, its role in software operations, and the challenges of SRE transformation
Assess your organizations current operations and readiness for SRE transformation
Achieve organizational buy-in and initiate foundational activities, including SLO definitions, alerting, on-call rotations, incident response, and error budget-based decision-making
Align organizational structures to support a full SRE transformation
Measure the progress and success of your SRE initiative
Sustain and advance your SRE transformation beyond the foundations
Table of contents
- Cover
- Title Page
- Contents
- Table of Contents
- Foreword
- Preface
- Acknowledgments
- About the Author
- Part I: Foundations
-
Part II: Running the Transformation
- Chapter 5. Achieving Organizational Buy-In
-
Chapter 6. Laying Down the Foundations
- 6.1 Introductory Talks by Team
- 6.2 Conveying the Basics
- 6.3 SLI Standardization
- 6.4 Enabling Logging
- 6.5 Teaching the Log Query Language
- 6.6 Defining Initial SLOs
- 6.7 Default SLOs
- 6.8 Providing Basic Infrastructure
- 6.9 Engaging Champions
- 6.10 Dealing with Detractors
- 6.11 Creating Documentation
- 6.12 Broadcast Success
- 6.13 Summary
- Chapter 7. Reacting to Alerts on SLO Breaches
- Chapter 8. Implementing Alert Dispatching
- Chapter 9. Implementing Incident Response
-
Chapter 10. Setting Up an Error Budget Policy
- 10.1 Motivation
- 10.2 Terminology
- 10.3 Error Budget Policy Structure
- 10.4 Error Budget Policy Conditions
- 10.5 Error Budget Policy Consequences
- 10.6 Error Budget Policy Governance
- 10.7 Extending the Error Budget Policy
- 10.8 Agreeing to the Error Budget Policy
- 10.9 Storing the Error Budget Policy
- 10.10 Enacting the Error Budget Policy
- 10.11 Reviewing the Error Budget Policy
- 10.12 Related Concepts
- 10.13 Summary
- Chapter 11. Enabling Error Budget–Based Decision-Making
- Chapter 12. Implementing Organizational Structure
-
Part III: Measuring and Sustaining the Transformation
- Chapter 13. Measuring the SRE Transformation
-
Chapter 14. Sustaining the SRE Movement
- 14.1 Maturing the SRE CoP
- 14.2 SRE Minutes
- 14.3 Availability Newsletter
- 14.4 SRE Column in the Engineering Blog
- 14.5 Promote Long-Form SRE Wiki Articles
- 14.6 SRE Broadcasting
- 14.7 Combining SRE and CD Indicators
- 14.8 SRE Feedback Loops
- 14.9 New Hypotheses
- 14.10 Providing Learning Opportunities
- 14.11 Supporting SRE Coaches
- 14.12 Summary
- Chapter 15. The Road Ahead
- Appendix: Topics for Quick Reference
Product information
- Title: Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations
- Author(s):
- Release date: September 2022
- Publisher(s): Addison-Wesley Professional
- ISBN: 9780137424887
You might also like
book
The Site Reliability Workbook
In 2016, Googleâ??s Site Reliability Engineering book ignited an industry discussion on what it means to …
book
Site Reliability Engineering
The overwhelming majority of a software system's lifespan is spent in use, not in design or …
audiobook
Site Reliability Engineering: How Google Runs Production Systems
The overwhelming majority of a software system's lifespan is spent in use, not in design or …
video
Site Reliability Engineering Fundamentals
Over the past five years, the ideas behind site reliability engineering (SRE) have caught fire because …