Errata

Web Scraping with Python

Errata for Web Scraping with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date Submitted
Printed Page 178
paragraph begins... FreeGeoIP


URL is http: //freegeoip.net but should be http: //freegeoip.app


Kevin Brown  May 01, 2021 
Printed Page 88
Code example, line 8 from top of page

The format of the code is as follows:

try:
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
writer.writerow(csvRow) # this is the line with the error

When run, this code writes the variable csvRow to the csv file for every cell found in the document. This results in a single row being written to the csv file 11 times, each time with one additional cell of information appended to the row.

The code should be formatted as follows:

try:
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
writer.writerow(csvRow) # this is the line with the corrected error

In this example the 6th line has been "untabbed" so that it is only run once the inner for loop has concluded.

Not a big issue by any means, but it did cost me about 5 minutes of debugging when using that example as a basis for my own scraper!

Anonymous  Apr 12, 2021 
Other Digital Version 1839
middle part

Hi :), I think instead of inherit from Website it should inherit from "Webpage". since this is initializing using the data from "Webpage":

class Webpage:
def __init__(self, name, url, titleTag):
self.name = name
self.url = url
self.titleTag = titleTag

class Product(Website):
def __init__(self, name, url, titleTag, productNumberTag, priceTag):
Website.__init__(self, name, url, TitleTag)
self.productNumberTag = productNumberTag
self.priceTag = priceTag

class Article(Website):
def __init__(self, name, url, titleTag, bodyTag, dateTag):
Website.__init__(self, name, url, titleTag)
self.bodyTag = bodyTag
self.dateTag = dateTag

NOTE: I BOUGHT THE KINDLE VERSION.

Daniel de Jesús Rosas Pérez  Feb 13, 2021 
ePub Page 1514
websites list

Two of the CSS selectors don't apply anymore. Also, the first URL redirected me to another one, so, I redid my CSS selector based on this new page.

I've bought the kindle version of the book, so, I wasn't able to know what page exactly I am on, nevertheless, I am at position 1514. If you wish to search by the content, I will give you this (the content where I found the issue):

websites = []
for ...

While this new method might not seem remarkably simpler than writing a new Python function for each new website at first glance, imagine what happens when you go from a system with 4 website sources to a system with 20 or 200 sources. Each list of strings is relatively easy to write. It doesn’t take up much space. It can be loaded from

Daniel de Jesús Rosas Pérez  Feb 11, 2021