Batch processing

Traditionally, the data processing pipeline within data warehousing systems consisted of Extracting, Transforming, and Loading the data for analysis and actions (ETL). With the new paradigm of file-based distributed computing, there has been a shift in the ETL process sequence. Now the data is Extracted, Loaded, and Transformed repetitively for analysis (ELTTT) a number of times:

In batch processing, the data is collected from various sources in the staging areas and loaded and transformed with defined frequencies and schedules. In most use cases with batch processing, there is no critical need to process the data in real ...

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.