Database reliability engineering
Five Questions for Laine Campbell about building dependable databases.
I recently sat down with Laine Campbell, principal consultant at OpsArtisan, to talk about the practice of database reliability engineering and ways that DBAs can build their expertise in this area. Here are some highlights from our chat.
How would you define “database reliability engineering”?
The practice of reliability engineering—focused on automation, removal of toil, and approaching systems and operational processes—applies just as strongly to the database tiers as to the application and web tiers. Today’s database professionals must be engineers, not administrators. We build things. We create things. We are all in this together, and nothing is someone else’s problem. As engineers, we apply repeatable processes, established knowledge, and expert judgment to design, build, and operate production data stores and the data structures within. As database reliability engineers, we must take the operational principles and the depth of database expertise that we possess one step further.
If you look at the non-storage components of today’s infrastructures, you will see systems that are easily built, run, and destroyed via programmatic—and often automatic—means. The lifetimes of these components can be measured in days, and sometimes even hours or minutes. When one goes away, there are any number of others to step in and keep the quality of service at expected levels. Databases have traditionally lagged behind the rest of the infrastructure. They’ve stayed monolithic, hard to reproduce and frankly, quite fragile. The database reliability engineer focuses on making databases good infrastructure citizens by leveraging similar tools and processes as those used by the rest of the engineering and operations organizations.
How do organizations take advantage of this set of practices? What are the challenges and how do they overcome them?
The first step is to bring the DBA teams out of their silos and into centralized, cross functional teams such as platform teams. There will never be enough DBAs to scale to the number of developers that are being hired into engineering shops. Having these teams who are heavily outnumbered focus on education and collaboration, building effective patterns for developers, and supporting shared services for databases and their ecosystems is the only way to scale the database reliability function.
There are any number of challenges that can come up through this process. One rather interesting one is the idea that DBAs cannot be gatekeepers to their datastores anymore. Giving software engineering teams guardrails and effective patterns can go a long way, but it’s a significant shift in mindset that takes time and a lot of work. There will be mistakes and availability impacts, any of which can create enough fear to cause a backslide to the old ways. Effective leadership and management support is required to overcome and push through these cultural sea changes.
Do things like microservices and containerization provide new challenges with regard to databases?
I wouldn’t say challenges so much as opportunities. Microservices help us get away from the monolithic and fragile databases that have become stones, weighing down our development and operational velocity. Containers allow for rapid deployment to support testing of new features, testing of automated deployments and rollbacks, and testing of operational processes. Building the environments for all of this testing has traditionally created a lot of toil for DBAs.
Still, moving to microservices creates a new opportunity for distributed integrity across datastores. New architectures such as event-driven and CQRS models are required, as are a greater focus on data integrity and validation pipelines.
What are the new skill sets that DBAs need to acquire to get ahead in their careers?
I have a few recommendations:
- Continuing to deepen their knowledge of datastores. This means relational, document, dynamo based K/V stores—all of the potential ways to effectively store data for durability and retrieval.
- Developing the mindsets of software engineers. Learning programming practices, algorithms and data structures. Getting comfortable working in version control systems and shared code repositories. Working with cross-functional teams rather than staying siloed.
- Developing a deep understanding of all of the services their database infrastructures rely on. This includes monitoring, configuration management, orchestration, and similar shared services. Similarly, understanding OS kernels, networks and cloud based infrastructures.
You’re speaking at the O’Reilly Velocity Conference in San Jose this month. What presentations are you looking forward to attending while there?
Everything in the data and distributed systems tracks. Having been given the honor of chairing the data track, I’m incredibly excited to see the results of the new approach Velocity is taking to the conference. I know that’s a bit of a cop-out, but I figured this is the chance to get the word out!