Chapter 3. Data Organization in the Cloud
Organizing data in the cloud has a lot in common with organizing clothes in a closet. Some folks arrange clothes based on season, shape, or color, making it easy to quickly find a pair of blue pants or an overcoat. Others may forgo organization altogether, haphazardly chucking clothes into a rumpled pile and resorting to heap search when they want to find something.
You can see these same scenarios in cloud storage. At its best, data is neatly organized and managed to reduce costs and improve performance. At its worst, it can be a bit of a black hole where disorganized piles of data accumulate, driving up costs and dragging down performance.
It’s very easy to put data into cloud storage, close the (figurative) door, and forget about it. That is, until performance starts to suffer or someone starts to complain about the bill. Fortunately, there isn’t a physical avalanche of data when you start combing through the forgotten corners of cloud storage, unlike an overstuffed closet.
This chapter covers cloud storage techniques to help keep data organized, performant, and cost-effective. To begin, you’ll see the different ways storage costs add up in terms of both the cloud bill and engineering overhead. Before you say “Storage is so cheap! Why should I care?” and decide to skip this chapter, I encourage you to read on. You’ll see several real-world situations where cloud storage costs accumulate and how to mitigate these scenarios.
After an ...
Get Cost-Effective Data Pipelines now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.