Chapter 7. When Recovery Is Required
You have now reached the point where you have built out your reliable AvailableTrade application. You have integrated your frontend and backend, and have been configuring these components to be resilient. As previously discussed, there will be failure modes where you will want to recover your applications. Your mitigations to certain failure modes may require you to use fault boundaries to provide a bounded recovery time.
Testing your recovery process is not just a task—it’s a responsibility. It forms the backbone of your application’s resilience. This process, comprising people, processes, and technology, must be thoroughly tested to instill a deep sense of commitment to its effectiveness during critical situations.
The primary purpose of these tests is to validate that all recovery mechanisms function as intended during an actual disruption. Regularly conducting recovery tests can identify potential weaknesses and gaps in your recovery plans, allowing for proactive improvements before actual incidents occur. This proactive approach minimizes the risk of prolonged downtime and data loss, which can have significant financial, operational, and reputational repercussions.
Recovery process testing is more than just familiarizing the team with the procedures and tools. It’s about giving you a sense of control and confidence. This familiarity ensures that the response is swift and efficient when an actual incident occurs, reducing the impact on ...
Get Engineering Resilient Systems on AWS now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.