5 Organizing and processing data

This chapter covers

  • Organizing and processing data in your cloud data platform
  • Understanding the different stages of data processing
  • Discussing the rationale for separating storage from compute
  • Organizing data in cloud storage and designing a data flow
  • Implementing common data processing patterns
  • Choosing the right file formats for archive, staging, and production
  • Creating a single parameter-driven pipeline with common data transformations

We will introduce a number of concepts, such as the difference between common data processing steps (such as file format conversion, deduplication, and schema management) versus custom business logic (such as the rules each company chooses to apply to transform their data ...

Get Designing Cloud Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.