Chapter 7. Organize Your Data

In Chapter 5 we discussed the use of hybrid multicloud architectures, composed of private clouds, public clouds, and on-premises systems, to host databases of all kinds. We looked at ways to combine data sources into “data lakes” and explored how to determine whether that’s worth doing. Modernizing entails building an infrastructure that allows you the flexibility to choose the right platform to host each data source or application.

In Chapter 6 we detailed the Collect rung, a rung dedicated to removing barriers to accessing your data and evaluating its utility. We looked at sources of data that you can use to augment your own. We also discussed getting a head start on building data catalogs (a prelude to this chapter): recording where your data comes from and uncovering exactly what that data means.

Now it’s time to look at how to improve the processes that track, protect, catalog, characterize, and govern data, organizing it to make sure that it’s suitable for use in AI applications (Figure 7-1).

Figure 7-1. The Organize rung of the AI Ladder

This is important because the standards for data used in AI applications are higher than most organizations are used to. There are two reasons for these higher standards: poor data leads to poor AI, and regulation demands quality data.

Poor Data Leads to Poor AI

While inconsistent, incomplete, and otherwise ...

Get The AI Ladder now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.