Video description
You're a software developer somewhat familiar with Apache Spark and how it's used to analyze Big Data. You've been tasked with a Big Data analysis job and you want to rent space on a cluster to do it. But where to begin?
This is a hands-on course where Amazon Web Services pro Frank Kane shows you how to rent Amazon's Elastic MapReduce service (EMR) at minimal cost and use it to run Spark scripts on top of a real Hadoop cluster. Kane's approach is fun: You'll learn a Big Data analysis process by actually deploying Spark on EMR to build a working movie recommendation engine using real movie ratings data.
- Learn Amazon EMR's undocumented "gotchas", so they don't take you by surprise
- Save money on EMR costs by learning to stage scripts, data, and actions ahead of time
- Understand how to provision an EMR cluster configured for Apache Spark
- Explore two different ways to run Spark scripts on EMR
- Learn how to set up security, and monitor a Spark cluster through a web UI
- Understand how to interactively develop Spark code on EMR with Apache Zeppelin
- Gain experience with Spark and AWS - two skills that are highly valued by employers
Table of contents
-
Introduction
- Welcome To The Course 00:02:23
- About The Author 00:01:50
-
Overview Of Spark On AWS
- What Is Spark? 00:06:06
- Elastic MapReduce And Spark 00:04:42
- Setting Up An AWS Account 00:02:29
-
Preparing Your Spark Script
- Overview Of Our Spark Script 00:09:19
- Packaging Your Script With SBT 00:07:27
- Uploading To S3 00:06:41
-
Launching Your EMR Cluster
- Provisioning Your Cluster 00:05:21
- Connecting To The Master 00:04:52
- Running Your Spark Script Manually 00:03:20
- Running Your Spark Script As A Step In EMR 00:06:55
- Overriding Spark Configuration Settings 00:07:04
-
Interacting With Your EMR cluster
- Setting Up An SSH Tunnel 00:05:44
- Using Zeppelin With Spark On EMR 00:05:57
-
Conclusion
- Wrap Up and Thank You 00:02:48
Product information
- Title: Analyzing Big Data with Spark and Amazon EMR
- Author(s):
- Release date: March 2017
- Publisher(s): Infinite Skills
- ISBN: 9781491985113
You might also like
book
Scala and Spark for Big Data Analytics
Harness the power of Scala to program Spark and analyze tonnes of data in the blink …
video
Hadoop and Spark Fundamentals
9+ Hours of Video Instruction The perfect (and fast) way to get started with Hadoop and …
video
Processing Covid-19 Data with Apache Spark
How to use JHU data and Apache Spark to predict Covid-19 outbreaks.
video
Analyzing Big Data with Hadoop, AWS, and EMR
Hadoop is today's most pervasive technology used in Big Data for distributing the processing of massive …