images

Big Data and the Cloud

WHAT YOU WILL LEARN IN THIS CHAPTER:

  • Exploring the Leading Cloud Providers: Amazon and Microsoft
  • Using the Amazon Elastic Compute Cloud (EC2) and Microsoft HDInsight Services as Cloud Processing Options
  • Hosting and Storing Your Data in the Cloud with Amazon Simple Storage Service (S3) and Microsoft Azure Blob Storage

Although many organizations will inevitably consider an on-premise solution for big data, the broad and ever-expanding appeal of the cloud makes big data approachable to those who would otherwise have neither the resources nor the expertise. The focus of this chapter is threefold. First, we introduce the cloud and look at two of the leading cloud providers: Amazon and Microsoft.

Next, we walk through setting up development or sandbox big data environments to explore the Amazon Elastic Compute Cloud (EC2) service and the Microsoft HDInsight service as cloud processing options.

We then explore two options—Amazon Simple Storage Service (S3) and Microsoft Azure Blob Storage (ASV)—for hosting or storing your data in the cloud. This discussion explains the considerations for cloud storage (pros/cons), reviews best practices for storing your data in the cloud, and identifies proven patterns and tools for loading and managing your data in the cloud.

Defining the Cloud

One of the most overused (and abused) contemporary technology terms is the cloud

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.