Skip to content
  • Sign In
  • Try Now
View all events
Web Scraping

Advanced Web Scraping

Published by O'Reilly Media, Inc.

Advanced content levelAdvanced
This live event utilizes Jupyter Notebook technology

Scraping data from a website like Wikipedia or sports-reference.com is pretty easy. Everything is rendered with vanilla HTML/CSS, and the tag elements are predictable and well labeled.

But what if the data you need to scrape isn’t tagged properly? Or it’s locked behind behind a login page, requires clicking and scrolling to get at, or is rendered with JavaScript? What then? Most likely you will have given up and moved on... No more!

In this live training, Max will help you take your web scraping skills to the next level so that you will be better equipped for the next pesky page that you have to scrape!

What you’ll learn and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Why some websites are harder to scrape than others
  • How to scrape data that is rendered in-browser with JavaScript
  • How to automate some browser tasks (like clicking and scrolling)

And you’ll be able to:

  • Schedule scraping jobs on a server
  • Setup notification and email triggers based on certain events

This live event is for you because...

  • You already have some web scraping experience, such as by taking Web Scraping in 60 Minutes (live online training course with Max Humber)
  • You want to scrape more difficult websites for personal and professional projects
  • You want to learn about the latest and greatest scraping tools

Prerequisites

  • Required: Experience with Python, and familiarity with BeautifulSoup
  • Optional: Take Web Scraping in 60 Minutes (live online training course with Max Humber)

Recommended preparation:

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Introduction (5 minutes)

  • Who am I, and who are you?
  • Poll:
  • Poll:
  • Learning Agenda

Basics (5 minutes)

  • A quick review on how to fetch HTML and quickly parse it
  • How target HTML element tags and attributes
  • Exercise: Scrape a “simple” website

Pesky Pages (15 minutes)

  • How to scrape data locked behind a login page
  • How to scrape data rendered with JavaScript
  • Exercise: Scrape a website with login credentials
  • Q&A (5 minutes)

Scheduling (10 minutes)

  • How to put a scraper on a schedule
  • How to send emails with scraping results
  • Exercise: Schedule a scraper

Browser Automation (15 minutes)

  • Replicate scrolling and browser clicks to get at hard to parse data
  • How to leverage Optical Character Recognition (OCR)
  • How to scrape images and other multimedia types
  • Exercise: Use OCR to parse non-text text data

Conclusion + Q&A (5 minutes)

Your Instructor

  • Max Humber

    Max Humber helps individuals, startups, Fortune 500 companies, and (sometimes) government agencies solve problems with technology. He also independently publishes apps at bracket and teaches at General Assembly.