J. M. PatelGetting Structured Data from the Internethttps://doi.org/10.1007/978-1-4842-6576-5_2

2. Web Scraping in Python Using Beautiful Soup Library

Jay M. Patel¹

(1)

Specrom Analytics, Ahmedabad, India

In this chapter, we’ll go through the basic building blocks of web pages such as HTML and CSS and demonstrate scraping structured information from them using popular Python libraries such as Beautiful Soup and lxml. Later, we’ll expand our knowledge and tackle issues that will make our scraper into a full-featured web crawler capable of fetching information from multiple web pages.

You will also learn about JavaScript and how it is used to insert dynamic content in modern web pages, and we will use Selenium to scrape information ...

Get Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale by Jay M. Patel

2. Web Scraping in Python Using Beautiful Soup Library

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly