Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

by Robert Ilijason

Released June 2020

Publisher(s): Apress

ISBN: 9781484257814

Start your free trial

Book description

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything aboutconfiguring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.

What You Will Learn

Discover the value of big data analytics that leverage the power of the cloud
Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture
See how these tools are used in the real world
Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free

Who This Book Is For

Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

Product information

Title: Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud
Author(s): Robert Ilijason
Release date: June 2020
Publisher(s): Apress
ISBN: 9781484257814

video

Apache Spark Streaming with Python and PySpark

by James Lee, Matthew P. McAteer, Tao W

Spark Streaming is becoming incredibly popular, and with good reason. According to IBM, 90% of the …

book

Distributed Data Systems with Azure Databricks

by Alan Bernardo Palacio

Quickly build and deploy massive data pipelines and improve productivity using Azure Databricks Key Features Get …

book

ETL with Azure Cookbook

by Christian Cote, Matija Lah, Madina Saitakhmetova

Explore the latest Azure ETL techniques both on-premises and in the cloud using Azure services such …

book

Implementing Azure DevOps Solutions

by Henry Been, Maik van der Gaag

A comprehensive guide to becoming a skilled Azure DevOps engineer Key Features Explore a step-by-step approach …

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Book description

Table of contents

Product information

You might also like

Apache Spark Streaming with Python and PySpark

Distributed Data Systems with Azure Databricks

ETL with Azure Cookbook

Implementing Azure DevOps Solutions

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly