Chapter 6. Responding and Recovering

truth is a seed

planted deep

if you want to get it

you have to dig

Katherena Vermette, “river woman”

As is true for any complex system, stressors and surprises are inevitable. Resilience entails the ability to gracefully recover from these adverse scenarios. Therefore, responding to and recovering from incidents reflects a critical phase in software delivery for systems resilience and its subset of systems security. Incidents astound us into seeing the differences between reality and our mental models in vivid, visceral relief. Incidents are a call to action for us to learn and revise our mental models, a signal that the system’s normal functioning may not be sufficient to maintain resilience against attack. Every incident may feel “irregular,” but if we dig deeper, we may distinguish patterns that challenge our assumptions and beliefs about system design.

In this chapter, we will explore how we can learn from those incidents and ensure we digest insights in all the other phases of software delivery to inform change, which is crucial for completing the resilience potion. There are tactics we can employ to make incident response efforts more decentralized too, sharpening our sensing abilities. And, as we’ll cover in depth, fostering a blameless culture promotes the final ingredients of our resilience potion: learning and willingness to change—resulting ...

Get Security Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.