What’s driving cloud native and distributed systems in 2019
Cloud native, security, performance, and SRE are areas of emphasis for the O’Reilly Velocity Conference in Berlin.
O’Reilly’s Velocity conferences attract some of the leading names in the fields of distributed systems architecture, engineering, and application development. Velocity is a great resource for architects, engineers, and developers to acquire or hone skills, explore provocative ideas, and network with peers.
An analysis of the proposals submitted for O’Reilly’s upcoming Velocity Conference in Berlin (November 4-7, 2019) yields a trove of insights as to how experts are pushing the frontiers of these fields, in addition to innovating in well-understood topic areas to improve or optimize established practices and patterns. This same analysis shines a light on which ideas, practices, and technologies are ascendant—and which are in decline.
A variety of qualitative and quantitative signals have already shown us that cloud native architecture, security, observability, and DevOps, along with related practices such as site reliability engineering (SRE), tend to predominate in distributed systems architecture, engineering, and development. This is why we identified these topics as key themes for Velocity Berlin 2019, and it’s why our call for speakers solicited contributions in these topic areas. But we wanted to see what the proposal data might reveal about the evolution of these topics. After all, different speakers are likely to have different takes—different emphases, different tool preferences, perhaps even different interpretations—on these topics. In short, we asked, what are the issues, trends, and technologies we should be watching?[1]
Our analysis of speaker proposals for Velocity Berlin surfaced several key findings:
- Cloud native is preeminent. The language, practices, and tools of cloud native architecture are prominent in Velocity Berlin proposals. From the term “cloud native” itself (No. 25 in the tally of the highest weighted proposals terms using TF*IDF) to foundational cloud native technologies such as “Kubernetes” (No. 2), cloud native is coming on strong.
- Security is a source of some concern. The term “security” not only cracked the top 5, it surged to No. 3, following Kubernetes. This suggests that even as cloud native comes on strong, there’s a degree of uncertainty—and perhaps also uneasiness—about how to secure the new paradigm.
- Performance is still paramount. Architects, engineers, and developers are using new tools, metrics, and even new concepts, to observe, manage, and optimize performance. This is as much a shift in language—with the terms “observability” rising and “monitoring” falling—as in technology.
- Site Reliability Engineering (SRE) is growing. Terms associated with SRE continue to ascend the rankings. SRE is a very different way of thinking about software development. Our analysis suggests SRE-like terms, concepts, and practices are beginning to catch on.
- Europe and the United States are different regions. And it isn’t just the metric system. For example, “observability” is a thing in Europe, but seems to be a slightly bigger thing in the US. It’s one of several terms that tend to be more popular on one side of the pond than on the other. Another term, oddly enough, is “cloud native,” which is more popular in the EU than the US.
Cloud native, serverless, and directions in software architecture
Cloud native architecture is a different way of thinking about how we build, deploy, scale, and maintain software. Its basic building blocks are container virtualization (e.g., Docker) and container orchestration/management, typically via Kubernetes; although, cloud native architecture may also make use of serverless computing cloud services (e.g., AWS Lambda, Azure Functions) as well as service mesh architecture—or other still-emerging architectural patterns. In practice, the scope of cloud native innovation spans a teeming landscape of technologies and practices, from containers and Kubernetes to a slew of complementary technologies, including Helm (a package manager for Kubernetes), knative (a runtime for deploying and orchestrating serverless functions on Kubernetes), Prometheus (an observation and analytics platform for cloud native environments), and others.
The language, practices, and tools of cloud native architecture are preeminent in this year’s Velocity Berlin speaker proposals. This makes sense: the shift to cloud native brings with it a new set of challenges, starting with a dizzying array of concepts. It’s incumbent upon architects, engineers, developers, and other technologists to rebuild their knowledge bases—What is microservice architecture? How does serverless computing relate to it? What is a service mesh?—as well as cultivate new skills in the technologies and practices that enable or underpin these architectures and patterns.
Compared to the speaker proposals for Velocity EU 2018 (held in London), the term “cloud native” itself surged by 104 positions in 2019 to No. 25. It’s up 1,475 positions relative to 2017; in 2015, by contrast, “cloud native” didn’t appear in a single Velocity EU proposal topic. (This shouldn’t be surprising: the seminal Cloud Native Computing Foundation was founded that same year.) “Kubernetes” was the No. 2 term in the 2019 proposals, down one place from 2018—but up 306 places from 2015. The term “serverless,” for its part, exploded in frequency: “serverless” was No. 6 in 2019—up 5 places from 2018 and 129 places from 2016, the year of its first appearance. “Service mesh” fell 26 places in 2019 to No. 182; however, this is most likely a function of its relative newness: it appeared for the first time in proposals in 2018, debuting at No. 156. “Service mesh” is, arguably, still settling in.
In the US, the term “cloud native” appears in the top 50 of Velocity proposal terms; in the EU it’s in the top 25. We don’t understand what’s behind the difference, but we think it worth noting that EU proposers find the phrase more semantically useful to explain what they hope to cover. On the other hand, topics of growing interest in the US—e.g., observability and chaos—seem to have less salience for audiences in the EU. This isn’t true of “microservices,” up 3 positions (to No. 14) from 2018 and up 28 positions from 2015. It has fluctuated across a similar range in US Velocity proposals: it ranked No. 22 in proposals for Velocity San Jose (June 2019) and No. 16 in Velocity New York proposals (October 2018). In the US and EU, microservices is a site of stability.
The staying power of the term “microservices” is surprising, in one sense: in another recent survey, we saw several established concepts or technologies decline in frequency in proposal topics. Our analysis suggests this decline is a function of maturity: as a technology, concept, or practice becomes better understood, it becomes less conspicuous, too: less problematic, less controversial, less strange or foreign. This is true of “containers” in the Velocity Berlin 2019 speaker proposals, for example: at No. 34, it tumbled 14 places from 2018 and 31 places from 2015. “Microservices” continues to sit comfortably near the top of the rankings because of its salience in and for system and software architecture, engineering, application development, and similar disciplines.
DevOps and SRE are sites of ongoing transformation
“DevOps,” the No. 13 term, is ensconced among the most popular proposal terms. This is true, too, of its frequency in the Velocity US proposals, where (with a single outlier) DevOps has ping-ponged between points inside the top 20 since 2015. Its prominence is indicative of DevOps’ importance in application development, SRE, and enterprise architecture. (“Enterprise architecture” is a shorthand term that encompasses system, software, and data architecture.) It reflects something else, too, in all likelihood: DevOps is at once a site of ongoing activity and innovation—hence its constancy in the first tier of the rankings—and a relatively well-understood discipline. DevOps is core to how we think about, build, deploy, and change software; a focus for developers, software engineers, and architects today, as well as a prime facilitator for the move toward microservices, a key component of Next Architecture; and an important part of cutting-edge practices like SRE. It’s a factor in each of these disciplines, but it isn’t a locus of feverish transformation and disruption.
DevOps-adjacent terms, some new, some old, remain sites of considerable activity. The term “automated” cracked the top 100 for the first time in 2019, at No. 88. Interestingly, “automated” was more likely to appear in European than in US speaker proposals: it appeared 58 times in proposals for Velocity Berlin 2019 and just 43 times in proposals for Velocity San Jose 2019. (It ranks much lower in the 2019 San Jose proposals, too, at No. 352.) “Automation” was at No. 191, an increase of 87 positions year-over-year. And “reliability,” a term that relates to SRE, was at No. 224 for Velocity Berlin. “Reliability” is used much more often in the US, cracking the top 150 in terms (No. 128) for the 2019 Velocity Conference in San Jose.
“Chaos,” a concept that is newly germane to SRE, rose to No. 76 in 2019. It was up 163 positions relative to 2018—but was unranked as recently as 2016. Chaos is yet another concept that seems to have more cachet on one side of the pond than on the other: it rocketed to No. 12 in the 2019 Velocity San Jose proposals, and surged in frequency in both of O’Reilly’s 2018 US Velocity conferences, from No. 214 in San Jose (June 2018) to No. 174 in New York (October 2018). Another SRE-related term, “observability,” is much more common in Velocity US than in Velocity EU proposals. We discuss why this might be the case in a separate section, below.
Observability
If IT administration and security are complicated enough in highly virtualized environments with hundreds or thousands of physical servers, they’re exponentially more complicated in a cloud native context, in which a single physical system could host dozens or potentially hundreds of programs and services. And with tens or potentially hundreds of thousands of services and/or functions, most of them instantiated as ephemeral containers, the software architecture of the future will have an awful lot of moving parts. At this scale, conventional diagnostic technologies premised on concepts such as monitoring and alerting are insufficient as a means to diagnose and address problems.
Future software architectures must develop a software-defined autonomic capacity of some kind. At a certain level, this reduces to the problem of designing software that is capable of diagnosing and detecting problems on the basis of feedback (i.e., data) it receives from an observable system. This last bit is the hard part. A “system” in this context isn’t the familiar application, database, operating system, device of old: it’s the interactions of a user with the services and functions that comprise a task, a feature, an application, etc. It’s the interactions of machine-orchestrated services and functions with one another. It’s the overall experience of the person attempting to add items to a shopping cart, change a travel itinerary, subscribe or unsubscribe to services. Experiences and interactions, along with events, alerts, and exceptions, must be made observable.
This emphasis on observability, and related terms, is reflected in the Velocity Berlin proposal topics. The frequency of the term “observability” declined some year-over-year to No. 176. On the other hand, terms related to “observability,” like “observable” and “observe,” all showed big jumps in usage, from small bases, in the last year. Interestingly, the verb “observe” could soon supplant “monitor” in the argot of architects, engineers, and developers: since rising to No. 245 in the 2017 Velocity EU proposals, “monitor” fell in 2018 (to No. 739) and again this year, dropping 140 places to No. 879. It is down a total of 345 places from 2015. Similarly, the term “monitoring” is also falling out of favor. In 2019, it very nearly dropped out of the top 50, plunging 21 places to No. 45. All told, it is down 36 positions relative to 2015, when it cracked the top 10, at No. 9.
Another way to think about this is that “monitoring” (and its related cluster of terms, such as “alerting,” down 1,041 places year over year) denotes an outside view of an observable system. Observability, by contrast, is about the gestalt: in other words, the “interior” view of the “health” of an application, resource, service, etc., but also the contextual view: i.e., the viability of the experience, interaction, or use of which the IT resource is a part. This doesn’t mean the old concepts have somehow been supplanted, either: even as “alerting”—which designates a particular application of a concept or technique—has plunged recently, “alert” (the core concept itself) has steadily climbed the rankings, rising 732 places to No. 202 in this year’s tally. “Alert” is up 1,670 places since 2015. This is true of monitoring, too: it’s one element of observability, of an observable system—no longer the thing itself.
Concerns about cloud-native security common to both Europe and the US
Add it all up and you’ve got clear evidence of the preeminence of cloud native architecture and engineering on both sides of the Atlantic. You’ve also got clear areas of difference, too, with cloud native a more salient topic of interest in Europe than in the US. Similarly, topics that are areas of growing interest in the US—e.g., observability and chaos—seem to have less salience for EU audiences. One topic does seem to preoccupy people on both sides of the pond, however. “Security” was the No. 3 term in Velocity Berlin proposals; it was the No. 8 term in proposals for the Velocity San Jose event in June 2019; but it was No. 1 in proposals for Velocity New York in October 2018. It’s less that architects, engineers, and developers are uneasy about how to secure the cloud native environments of the future than that they’re curious, and, at least to a degree, uncertain.
Next-generation distributed architectures will have an abundance of moving parts and will make use of a number of new, still-gestating, or (in some cases) as-yet-identified technologies, practices, and patterns. Yes, cloud native’s core enabling technologies—chiefly, containers and container orchestration—are fairly well understood. And, yes, there’s an emerging consensus that containers, in particular, boast advantages relative to both other schemes for virtualization and conventional physically instantiated resources. But technologists have concerns.
Security seems to be one of those topics of concern. Many less frequently referenced phrases related to security—such as “infrastructure security” and “security container”—all showed substantial increases from small bases. At different times and in different eras, security has, arguably, been an issue of secondary importance in the context of system and software architecture and engineering. In the shift to cloud native, it seems to be a topic of especially keen interest.
Concluding thoughts
It’s possible that many of the terms that ascended the rankings in 2019 will plunge, perhaps precipitously, in next year’s speaker proposals. Volatility of this kind is a constant, as we saw in this year’s analysis. But the practices, technologies, and concepts that are core to the cloud native paradigm likely won’t change significantly. Nor, it seems likely, will the pronounced emphasis on experimentation—with terms such as “testing,” “failure,” and “tracing,” among others, trending upward—that seems to have supplanted more conventional emphasis on terms such as stability, performance, or, even, availability. In 2015 and 2016, “performance” was the No. 1 term in Velocity EU proposals; it fell to No. 8 in 2017 and plunged to No. 62 in 2018. “Performance” recovered, slightly, in 2019, climbing to No. 17. “Scale,” the No. 22 term in Velocity EU 2019 proposals, has dropped about 10 rank positions from its consistent position just outside of the top 10 from 2016 to 2018. Both “reliability” and “availability” continue to fall in the rankings, too; “reliability,” at No. 224, was down a single slot, year over year—but down 127 places relative to 2016. “Availability,” however, plunged in frequency and importance in Velocity EU 2019 proposals, dropping 545 places to No. 795, year over year.
It isn’t that these issues are no longer important; it’s that they’ve been taken up as part of new frames—i.e., new ways of thinking about the viability of the gestalt of infrastructure. We used to understand infrastructure primarily in terms of the performance of its constitutive resources—e.g., the indicators or metrics of this system, this application, or this service. This understanding gave priority to running stable, reliable, predictably performant resources in production. We’re shifting to a radically different kind of thinking that conceives of instability and unpredictability not as anomalous or aberrant phenomena but as the inevitable vicissitudes of developing, deploying, and improving software. This change in thinking and understanding is mirrored in practices (such as SRE) and concepts (observability) that emphasize maturation and improvement. Core to this is the insight that maturation and improvement are accelerated by deploying and testing in production.
An increase in the use of terms that relate to experimentation is consonant with SRE and its test-in-production ethos. For the present, this ethos (and its practices) seem to be ascendant. Will that still be the case in two or three years’ time? We plan on monitoring the technology ecosystem around Velocity and SRE to see how the space evolves over the next few years.