Infrastructure & Ops Superstream: Observability

Intermediate

Observability—a measure of how well internal states of a system can be inferred from its outputs—is crucial for engineering, managing, and improving complex business-critical systems. Join us to learn how observability can help any software engineering team gain a deeper understanding of system performance, so you can perform ongoing maintenance and ship the features your customers need.

About the Infrastructure & Ops Superstream Series: This four-part Superstream series guides you through what you need to know about modernizing your organization’s infrastructure and operations, with each event day covering different topics and lasting no more than four hours. They’re packed with the expert insights, skills, and tools that will help you effectively manage existing legacy systems while migrating to modern, scalable, cost-effective infrastructures—with no interruption to your business.

What you’ll learn and how you can apply it

Gain a deeper understanding of system performance so you can perform ongoing maintenance and ship the features your customers need
Understand how to build an observability-driven development practice
Discover how your production services are really performing right now

This live event is for you because...

You’re a developer who wants to learn the basics of observability and how to use it in your system.
You want to better understand how observability can be used with data.
You want to know what the future holds for observability and infrastructure and operations.

Prerequisites

Come with your questions
Have a pen and paper handy to capture notes, insights, and inspiration

Recommended follow-up:

Read Observability Engineering (book)
Read Kubernetes Security and Observability (book)
Watch Observability at Google (video)
Read Linux Observability with BPF (book)
Watch How Lightstep Implemented Observability (case study)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Sam Newman: Introduction (5 minutes) - 8:00am PT | 11:00am ET | 4:00pm UTC/GMT

Sam Newman welcomes you to the Infrastructure & Ops Superstream.

Ben Sigelman: From the Trenches (45 minutes) - 8:05am PT | 11:05am ET | 4:05pm UTC/GMT

Join us for a special conversation with Lightstep cofounder and CEO Ben Sigelman. He’ll recount some of the challenges he’s faced during his career and shed light on the things that worked well (and those that didn’t) as he worked on the OpenTracing and OpenTelemetry projects and built his own company.
Ben Sigelman is a cofounder and GM at Lightstep. He’s a cocreator of Dapper (Google’s distributed tracing system), Monarch (Google’s metrics system), and the CNCF’s OpenTelemetry project. Ben's work and interests gravitate toward observability, especially where microservices, high transaction volumes, and large engineering organizations are involved.
Break (10 minutes)

Charity Majors: The Glorious Future of Observability (45 minutes) - 9:00am PT | 12:00pm ET | 5:00pm UTC/GMT

Distributed systems, microservices, containers and schedulers, continuous delivery… We’ve been through one paradigm shift after another when it comes to architecture, but when it comes to observability we’re still using crufty old logging and metrics tools that date back to the LAMP stack era. But these tools fall apart past a certain level of complexity. Charity Majors digs into some of the deep technical reasons why, then shares some modern approaches to debugging complex systems, including Honeycomb and distributed tracing.
Charity Majors is an ops engineer and accidental startup founder at Honeycomb. Previously, she worked on infrastructure and developer tools at Parse, Facebook, and Linden Lab—and always seemed to wind up running the databases. She’s the coauthor of Database Reliability Engineering and the upcoming Observability Engineering (both from O’Reilly). She loves free speech, free software, and single malt scotch.
Break (10 minutes)

Jiaqi Liu: Observability for Data Pipelines—Monitoring, Alerting, and Tracing Lineage (50 minutes) - 9:55am PT | 12:55pm ET | 5:55pm UTC/GMT

Data-intensive applications, with their many layers of transformations and movement from different data sources, can often be challenging to maintain after they’re initially built and validated. To truly expand and develop a codebase, developers must be able to test confidently during the development process and monitor the production system. But monitoring and testing data pipelines or real-time streaming processes can be very different from monitoring web services. Jiaqi Liu draws on her experience building and maintaining both batch and real-time stream data pipelines to explain how to leverage monitoring tools like Prometheus and Grafana to define and visualize metrics, how and when to alert on common health indicators, and how to gain visibility in monitoring not just the system's health but the health of the data.
Jiaqi Liu is a senior engineering manager at GitHub, where she leads the database infrastructure team. Previously, she was a platform engineer and data scientist, roles that both involved building data pipelines that had multiple points of failure. Jiaqi is passionate about open source and investing in infrastructure and champions inclusivity in the workplace. She’s also active in the Write/Speak/Code and Women Who Code communities. Outside of work, you can find her hiking with her very hyper dog, Remi.
Break (10 minutes)

Michael Hausenblas: Embracing Observability in Distributed Systems (55 minutes) - 10:55am PT | 1:55pm ET | 6:55pm UTC/GMT

Michael Hausenblas discusses good practices for observability in distributed systems and explores current developments around CNCF open source projects and specifications, including OpenTelemetry, Prometheus, and FluentBit.
Michael Hausenblas is a solution engineering lead on the open source observability service team at AWS. He covers Prometheus, Grafana, and OpenTelemetry upstream and in managed services. Previously, Michael worked at Red Hat, Mesosphere (now D2iQ), MapR (now part of HPE) and spent 10 years in applied research.

Sam Newman: Closing Remarks (10 minutes) - 11:50am PT | 2:50pm ET | 7:50pm UTC/GMT

Sam Newman closes out today’s event.

Upcoming Infrastructure & Ops Superstream events:

Distributed Computing - April 27, 2022
Linux Fundamentals - September 14, 2022
Operationalizing Kubernetes - November 9, 2022

Your Host

Sam Newman
Sam Newman is a technologist focusing on the areas of cloud, microservices, and continuous delivery—three topics which seem to overlap frequently. He provides consulting, training, and advisory services to startups and large multinational enterprises alike, drawing on his more than 20 years in IT as a developer, sysadmin, and architect. Sam is the author of the best-selling Building Microservices and Monolith to Microservices, both from O’Reilly, and is also an experienced conference speaker.

search