Brief Table of Contents
Chapter 2. Accelerating large dataset work: Map and parallel computing
Chapter 3. Function pipelines for mapping complex transformations
Chapter 4. Processing large datasets with lazy workflows
Chapter 5. Accumulation operations with reduce
Chapter 6. Speeding up map and reduce with advanced parallelization
Chapter 7. Processing truly big datasets with Hadoop and Spark
Chapter 8. Best practices for large data with Apache Streaming and mrjob
Chapter 9. PageRank with map and reduce in PySpark
Chapter 10. Faster decision-making with machine learning ...
Get Mastering Large Datasets with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.