In this chapter, we’ll go through the basic building blocks of web pages such as HTML and CSS and demonstrate scraping structured information from them using popular Python libraries such as Beautiful Soup and lxml. Later, we’ll expand our knowledge and tackle issues that will make our scraper into a full-featured web crawler capable of fetching information from multiple web pages.
You will also learn about JavaScript and how it is used to insert dynamic content in modern web pages, and we will use Selenium to scrape information ...