Mastering Large Datasets with Python

Chapter 11. Large datasets in the cloud with Amazon Web Services and S3

This chapter covers

Understanding distributed object storage in the cloud
Using the AWS web interface to set up buckets and upload objects
Working with the boto3 library to upload data to an S3 bucket

In chapters 7–10, we saw the power of the distributed frameworks in Hadoop and Spark. These frameworks can take advantage of clusters of computers to parallelize massive data processing tasks and complete them in short order. Most of us, however, don’t have access to physical compute clusters.

In contrast, we can all get access to compute clusters from cloud service providers such as Amazon, Microsoft, and Google. These cloud providers have platforms that we can use for ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Large Datasets with Python by John Wolohan

Chapter 11. Large datasets in the cloud with Amazon Web Services and S3

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly