Chapter 36. Making Work Visible

Lorin Hochstein

Wait a second. . .you used a REPL1 to figure it out?

I was taking notes for a colleague who was interviewing an engineer after an incident. One particular service had gotten stuck, and the engineer was discussing how they figured out what the problem was. Before that moment, I had no idea that we supported launching a REPL on a production box to interrogate the state of that service.

Most of the work that we do is invisible to others; they see the results, but not how we got there. Even during an incident, where we’re working in close collaboration with others, our peers rarely have the opportunity to observe exactly what we’re doing. They don’t see which queries we’re running, which graphs or logs we’re looking at, how we interpret these results, and how we decide where to look next.

There is enormous value in making this work visible: in providing coworkers with a window into the messy details of our day-to-day work.

In order to address the problems people encounter in our organizations, we need to understand what those problems are: an operational tool has an error-prone user interface, or a team with a high workload that requires them to constantly context switch.

Nobody in your organization has a complete understanding of how the system works, and we are often bitten by an important bit of context that we didn’t have. ...

Get 97 Things Every SRE Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.