© Jay M. Patel 2020
J. M. PatelGetting Structured Data from the Internethttps://doi.org/10.1007/978-1-4842-6576-5_2

2. Web Scraping in Python Using Beautiful Soup Library

Jay M. Patel1 
(1)
Specrom Analytics, Ahmedabad, India
 

In this chapter, we’ll go through the basic building blocks of web pages such as HTML and CSS and demonstrate scraping structured information from them using popular Python libraries such as Beautiful Soup and lxml. Later, we’ll expand our knowledge and tackle issues that will make our scraper into a full-featured web crawler capable of fetching information from multiple web pages.

You will also learn about JavaScript and how it is used to insert dynamic content in modern web pages, and we will use Selenium to scrape information ...

Get Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.