Chapter 3. Operate

In the days of shipping software on CDs, software releases were considered done once the CDs were in the users’ hands. Now, with software delivered digitally through the cloud or hosted remotely, there isn’t an equivalent sense of “done.” Complexity abounds and entropy encroaches. The interplay among your code, user behavior, third-party integrations, internal tooling, cloud hosting, and SaaS vendors means that your software can perform unexpectedly at any time. While the previous two chapters focused on deploying changes and releasing software, this next stage in our revised operating continuously model, operate (see Figure 3-1), is focused on managing the impact of these changes as new features are released into your application landscape.

The operate stage
Figure 3-1. The operate stage

While operations are not exclusively about managing incidents, incident management is a critical piece of getting operations running correctly. When incidents aren’t handled well, your operations team may find it has little time for anything else. Thus our focus for this chapter will be on identifying and managing incidents.

There are two aspects to dealing with the inevitability of incidents. The first involves what you do before an incident occurs, since one will occur. The second is how you handle an incident after it happens.

Before an incident occurs, you want to create processes that ...

Get Operating Continuously now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.