Chapter 3Data Acquisition

How data integration experts work to integrate data ready for advance analysis

image

Data preparation typically involves data selection from different sources. Integration from different sources is a critical time-consuming step. Data are rarely centrally available, summarized, and/or ready to be consumed in advanced analytical tasks.

Data integration involves combining data residing in different sources and providing users with a unified view of these data.1 This process becomes significant in a variety of situations. For instance, megaresorts must pull data from their casinos, retail, restaurants, golf course, hotel, and so on to conduct effective marketing campaigns.

A concept that appears as a result of bringing data from different resources is the need to transform the data in such a way that they are intelligible when put together in a data warehouse, for example. Metadata are data about the data. In simple words, metadata are a dictionary of the data contained in a particular system, complete with definitions and transformations among other characteristics.

A data transformation converts a set of data values from the data format of a source data system into the data format of a destination data system.

Data transformation for system needs can be divided into two steps: data mapping and extract, transform, and load (ETL) code generation. In addition, ...

Get Understanding the Predictive Analytics Lifecycle now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.