Chapter 6. Collect Your Data

In the preceding chapter we talked about the process of modernizing your entire data infrastructure to make it one integrated, efficient platform. A consistently audited, asset-optimized, and instrumented-in-IT infrastructure is essential for flexible and cost-efficient operation in the AI-centric world.

You may have noticed that there were a few important things we didn’t talk about in that chapter. For example, we haven’t talked about how you acquire data. Nor have we talked about the quality of the data that resides in that platform, or how to make it available to the AI processes and programs that may require it or benefit from it. We certainly haven’t talked about how to get the data you already have under control. Many companies think “Of course we have data,” only to find that there are many reasons why this data isn’t really accessible. Those reasons may be technical, political, regulatory, or some combination of the three—but they’re real. We always tell people that big data without analytics is…well…just a bunch of data. Never forget: data may be an asset, but it’s a valueless asset if you can’t use it.

In this chapter we will talk about getting access to all relevant data and evaluating its utility. We’ll look at ways to consolidate data sources, because at far too many companies data resides in departmental silos that prevent it from being used effectively. We’ll discuss the pros and cons of combining data sources into “data lakes” and ...

Get The AI Ladder now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.