Web Scraping Tutorial with Scrapy and Python for Beginners

Video description

Web scraping is the process of scraping websites and extracting desired data from them, and in this course, you will learn and master web scraping using Python and Scrapy with a step-by-step and in-depth guide.

The course starts with introducing you to the web scraping process (with infographics—no code); learn how to scrape data from websites and how to use Scrapy for this. After getting the basics clear, you will perform web scraping using Python and the Scrapy framework! After you have built an actual web scraper, you will get an idea of how web scraping works first-hand. You will then look at the essential concepts of web scraping and Scrapy. Learning how to scrape websites and the essentials already makes you a complete web scraper but you will take this even further and learn the advanced web scraping techniques to become an expert.

Advanced topics such as crawling multiple pages and extracting data—pagination, scraping data using Regular Expressions (RegEx), scraping dynamic or JavaScript-rendered websites using Scrapy Playwright—will be thoroughly explained. Finally, you will perform three projects at the end—Champions League Table [ESPN], Product Tracker [Amazon], and Scraper Application [GUI].

By the end of this course, you will have learned how to do web scraping using Python and Scrapy.

What You Will Learn

  • Send a request to a URL to scrape websites using Scrapy Spider
  • Get the HTML Response from the URL and parse it for web scraping
  • Use Scrapy shell commands to test and verify CSS Selectors or XPath
  • Export and save scraped data to online databases such as MongoDB
  • Scrape data from multiple web pages using Scrapy pagination
  • Login to websites using Scrapy FormRequest with CSRF tokens

Audience

This course is ideal for beginner Python developers who want to master web scraping or freelance web scrapers looking to polish their skills. Any individual and college students working on their projects and wanting to master web scraping using Python and the Scrapy module, then this course is for you. A basic understanding of Python programming is a must and elementary-level knowledge of HTML basics will be a plus but not mandatory.

About The Author

Rahul Mula: Rahul Mula is a developer specializing in Python, Flutter, and web development. He was really intrigued the first time he learned about programming and realized what could be done with it. He loves to explore different technologies and create applications to build something new. He has developed Keyviz—the free and open-source tool to visualize keystrokes in real-time. He has written books and created courses on Python programming teaching thousands of students.

Table of contents

  1. Chapter 1 : Introduction to the Course
    1. What Is Web Scraping
    2. How Web Scraping Works
    3. Web Scraping with Scrapy
  2. Chapter 2 : Scrapy Installation
    1. Scrapy Installation for Windows
    2. Scrapy Installation for Ubuntu (Linux)
    3. Creating Scrapy Project
    4. Project Walkthrough
  3. Chapter 3 : Scrapy Spider
    1. Creating Spider
    2. Sending Request
    3. Getting the Response
    4. Scrapy CSS Selector
    5. Selecting All the Data
    6. Extracting Data
    7. Spider Overview
  4. Chapter 4 : CSS Selectors
    1. CSS Selectors Versus XPath: How to Select Web Elements
    2. Tagname, Class, and Id Selectors
    3. Attribute Selectors
  5. Chapter 5 : XPath
    1. XPath Expressions
    2. XPath Attribute Selectors
    3. XPath text( ) Function
  6. Chapter 6 : Scrapy Shell
    1. What Is the Scrapy Shell and How to Use It?
    2. fetch( ) Response
    3. Shell Configuration
  7. Chapter 7 : Scrapy Items
    1. Structuring Data into Scrapy Item
    2. Using Item in Spiders
    3. Define Input and Output Processors for Item Fields
    4. Loading Items with Scrapy ItemLoaders
    5. Items, Processors, and ItemLoaders Overview
  8. Chapter 8 : Exporting Data
    1. Output Extracted Data in JSON, CSV, and XML Formats
    2. Overwrite Previous Output
    3. Appending Data to Previous Output
  9. Chapter 9 : Scrapy Item Pipeline
    1. How to Use Scrapy Item Pipelines
    2. Saving Data Locally to Excel ( XLSX ) Files
    3. Enable Item Pipelines in Settings
    4. MongoDB (Account) Setup
    5. Saving Data to MongoDB
  10. Chapter 10 : Pagination
    1. Extracting Links from href Attributes
    2. Send Request to the Next Page
    3. start_requests( ) Method
  11. Chapter 11 : Following Links
    1. How to Follow Links
    2. How to Select Data Using Regular Expressions with Scrapy
    3. Setting Up Custom Callback Function
    4. Parse Product Details Page
  12. Chapter 12 : Scraping Tables
    1. HTML Tables
    2. Selecting Tables Data
    3. Extract Data from HTML Tables
  13. Chapter 13 : Logging into Websites
    1. Data Hidden with Logging Forms
    2. Inspecting HTML Forms and Website Activity with Dev Tools
    3. Logging into Websites with FormRequest
    4. CSRF Protected Login Forms
    5. Extract CSRF Values from Forms
  14. Chapter 14 : Scraping JavaScript Rendered Websites
    1. What Are JavaScript Rendered/Dynamic Websites?
    2. scrapy-playwright Installation
    3. Setting Up Playwright in Scrapy Project
    4. Using Playwright to Render Websites
    5. Scraping Data from Dynamic Websites
  15. Chapter 15 : Scrapy Playwright
    1. Playwright Overview
    2. Playwright Page Object
    3. Logging in with Playwright
    4. Dynamic Websites with Loading Screens
    5. Wait for Selector/Elements Using Page Coroutines
    6. Dynamic Websites with Infinite Scroll
    7. Taking Screenshot of Websites
    8. Rendering Websites to PDF
  16. Chapter 16 : API Endpoints
    1. Identifying API Calls
    2. Requesting Data from API
    3. Extracting Data from API
  17. Chapter 17 : Settings
    1. Scrapy Project Settings
    2. Robots Text
    3. Middleware
    4. Autothrottle Extension
  18. Chapter 18 : User Agents and Proxies
    1. What Are User Agents?
    2. User Agents with Scrapy
    3. What Are Proxies?
    4. Proxies with Scrapy
  19. Chapter 19 : Tips and Tricks
    1. Spider Arguments
    2. Standalone Spiders
    3. Scrapy Shell with bpython
    4. Scrapy Get Versus Extract Method
    5. Logging
  20. Chapter 20 : Project 1: Champions League Table from ESPN.com
    1. Overview
    2. Website Visual Inspection
    3. Finding the Selectors
    4. Building the Spider: Extract Teams Data
    5. Building the Spider: Extract Teams Details
  21. Chapter 21 : Project 2: Amazon Product Rank
    1. Overview
    2. Scraper Visualization
    3. Finding the Selectors
    4. Building the Spider
  22. Chapter 22 : Project 3: Extending Scraper with GUI
    1. Scraper Application
    2. Building the GUI (Application Interface)
    3. Running the Spider from the Application

Product information

  • Title: Web Scraping Tutorial with Scrapy and Python for Beginners
  • Author(s): Rahul Mula
  • Release date: November 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781804615317