The real story on container, cloud, and data adoption
Poll results reveal where and why organizations choose to use containers, cloud platforms, and data pipelines.
Mesosphere conducted a poll of approximately 1,000 IT professionals to understand where they are on their container, cloud, and data adoption. Above all, the poll shows that companies are investing heavily in migrating to containers, running those containers in the cloud, and improving their data pipelines.
From outward appearances, these three pieces don’t appear related, but they go hand-in-hand. Organizations start by looking at how to improve their data pipelines. The ops team asks how they’re going to monitor hundreds of processes running on dozens of machines. Then someone says that you can monitor and separate those processes using containers. Finally, the manager looks at their hardware budget for the year and asks how much all of this new hardware will cost. The team tells the manager about the glorious land of the cloud where there aren’t upfront costs. Even better, costs can fluctuate with dynamic and usage-based allocation of resources. Everyone rejoices, pandemonium ensues, and then everyone realizes they’re nerds.
With paradisaic conditions like these, why would anyone not be using containers, cloud, and data pipelines? The results of the poll help us see why.
Data strategies
About half the people who responded said they’re either the decision maker or an influencer in the decision. By looking at the results, we’ll see why they’re making the technologies choices. We can see one driver for improved operations stems from the need for a better data pipeline and data strategy. This includes using the tools that have been around for a while such as MySQL (58%) and PostgreSQL (2%). We’re also seeing large upticks in adoption from newer technologies such as MongoDB (45%), Apache Kafka (41%), Hadoop (35%), and Apache Spark (33%).
Let’s take Kafka as example of a technology pushing the need for better operational efficiency. With Kafka, you’ll want to run producers, consumers, and producers/consumers. At scale, there can be hundreds or thousands of these processes running. How do we know all 100 or 1,000 of those processes are alive and healthy? How do we dynamically make sure there are the right number of processes running? Without the right processes and monitoring in place, your new data pipelines could be an albatross for your operations team.
I’ve taught Kafka extensively. This operational complexity is one of the first questions that companies need to solve for Kafka or other technology that’s facilitating their data strategy.
Containerization
When asked about their top initiatives this year, 61% of respondents said they are modernizing their infrastructure. The actual technologies and changes for “modernizing infrastructure” are wide-ranging. The poll dove even deeper into the sorts of projects people are undertaking. 59% said they’re trying to improve their developer productivity and 55% said they are embarking on a containerization project. There are the usual greenfield projects and 18% said they’re just going to containerize new things. Another 13% are going back to old applications to containerize them. Meanwhile, 34% are going to do both.
These organizations will face some challenges. It isn’t going to be easy to go back and update those old applications, as 47% of respondents indicated. Like many technology trends, the wetware (people) lags behind and 45% said keeping up with old applications is an issue.
Cloud
So, where do we run all of these new Kafka brokers and containers? The respondents’ companies had a wide range of annual revenues, but 44% had less than $1 billion in revenue. That size of company usually isn’t eager to expand their data center footprint. In fact, many are trying to reduce or eliminate their data centers. This is underscored by 51% of respondents saying they’re migrating to the cloud.
These moves prove difficult, too. Many applications running in the enterprise were never envisioned running anywhere outside of the organization’s data center. Developers are grappling with the task of trying to run these legacy applications in the cloud and 50% responded that they’re facing this issue.
Speaking of the data center, there’s the organizations who have heard of the cloud and would like to use it. But 45% of respondents weren’t “all in” on the cloud, and those folks said they can’t replicate the levels of warm fuzzies, security, and compliance that their current data center gives them.
I was having a conversation about the cloud with a friend of mine. We were discussing if there’s a point in the future where everything will be in the cloud. His thought was that the cloud providers would have such economies of scale, no one could be persuaded to run their own data center. I contended that there won’t ever be a time where everything is in the cloud. There won’t be a cost difference high enough that a security/compliance person won’t say that any cost savings would be wiped away after the first security incident.
That’s mirrored by 28% of respondents saying they’re going to use a hybrid cloud. They’ll use cloud resources when it makes sense and use on premises resources when it makes sense. Still another 17% have a “cloud first” approach where everything goes into the cloud.
Going deeper: Next steps
If you’re reading this post to compare notes on what you should be doing, there are a few good pointers. Companies are modernizing and improving their data pipelines.
You might be reading this to double check that you’re on the right track and what to watch out for on the way. These efforts are well worth your time. My clients are experiencing great value in their data strategy and infrastructure modernization efforts.
If you’re interested in exploring container orchestration further, look no further than our free Kubernetes: Up and Running excerpt to get started.
This post is part of a collaboration between O’Reilly and Mesosphere. See our statement of editorial independence.