Errata

Errata for Data Wrangling with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
PDF	Page 49 last paragraph	"However, if our code was in a subfolder called data," Replace the word 'code' with 'data'. Note from the Author or Editor: should read However, if our code was in a subfolder called code instead of "However, if our code was in a subfolder called data,"	Ron B	Feb 17, 2016	Jan 27, 2017
Printed	Page 76 2nd paragraph	Paragraph states: "From this folder, type the following command in your terminal to run the script from the command line: python parse_script.py" I think it is meant to say: "python parse_excel.py" since that is what you called the new python file in the prior paragraph (step # 2): "2. Create a new Python file called parse_excel.py and put in the folder you created. " Note from the Author or Editor: Yes, it should say: python parse_excel.py	Bryan P	Mar 09, 2016	Jan 27, 2017
PDF	Page 94 2nd paragraph	"This code prints the first two lines of the file" Replace 'lines' with 'pages' Note from the Author or Editor: This code prints the first two lines of the file. Please update to: This code prints the first two pages of the file.	Ron B	Feb 17, 2016	Jan 27, 2017
Printed, PDF	Page 94 Last paragraph	Missing sudo command in code: pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515 should read (at least for my Mac): sudo pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515 Note from the Author or Editor: If you are using a virtual environment, you can simply type: pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515 Otherwise, use: sudo pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515	zenzontle	Mar 12, 2016
PDF	Page 94 Warning Text Box	In Warning box text, the pip install option "--ignoreinstalled" should be "--ignore-installed". Windows 7 Python version 2.7.11 pip 8.1.1 Note from the Author or Editor: Yes, it should be pip install .... --ignore-installed	Anonymous	Mar 29, 2016	Jan 27, 2017
Printed	Page 94 Code line no. 3	After installing slate and pdfminer using recommended method after ImportError, it still is not possible to process PDF in sample zip. The error message is : PDFSyntaxError: No /Root object! - Is this really a PDF? This was solved at : https://stackoverflow.com/questions/11384591/parsing-a-pdf-with-no-root-object-using-pdfminer/11438571 in the last solution on the page; the open statement should use options 'rb'. So code line 3 reads: with open(pdf, 'rb') as f: Windows 7 python 2.7.11 slate 0.3 pdfminer 20110515 PDF Book February 2016: First Edition Revision History for the First Edition 2016-02-02 First Release Note from the Author or Editor: Unsure if this is a windows-only issue, but regardless opening as 'rb' should be standard protocol, so let's change it: with open(pdf, 'rb') as f:	Anonymous	Mar 29, 2016	Jan 27, 2017
ePub	Page 116 last paragraph	Part of the paragraph reads: "...can be done simply by running pip install pdftables and pip requests install." Should read "can be done simply by running pip install pdftables and pip install requests." Note from the Author or Editor: As noted, please change: pip requests install to pip install requests	Deb R.H.	May 08, 2016	Jan 27, 2017
Printed	Page 301 2nd paragraph	the URL https://enoughproject.org/take-action brings up https://enoughproject.org/get-involved/take-action with what appears to be a different HTML structure, so in following pages some of the code returns errors. For instance, on P. 301 the code in paragraphs 15 and 16 throws AttributeError: object has no attribute "'descendants" Note from the Author or Editor: Hi there, This is indeed the case! You can find old versions of the website pages and code in the code repository. https://github.com/jackiekazil/data-wrangling That should allow you to follow along with the book with old copies of the page. Unfortunately, as you probably know from reading the chapter, the web is constantly changing and this means scraping content is a never ending job. Hope this helps! -katharine	John Roby	Sep 06, 2017
Printed	Page 399 second code snippet, second line of code, footnoted as '1'	The line `from emojispider.items import EmojispiderItem` should read `from scrapyspider.items import EmojiSpiderItem`, as per the Github example: https://github.com/jackiekazil/data-wrangling/blob/master/code/chp12-scraping/scrapyspider/scrapyspider/spiders/emo_spider.py Note from the Author or Editor: This is correct, we should change `from emojispider.items import EmojispiderItem` should read `from scrapyspider.items import EmojiSpiderItem`,	Anonymous	Jun 30, 2016	Jan 27, 2017
PDF	Page 439 5th paragraph	The output of the GCC compilers is machine code, NOT byte code. GCC does not need to be installed to use the CPython interpreter or PyPy JIT to turn python code into bytecode or machine code, respectively. GCC would be needed to compile Cython code to machine code. Cython is not used anywhere in this book. Note from the Author or Editor: Please update this sentence: "The purpose of GCC (the GNU Compiler Collection) is to take code written in Python and turn it into something your machine can understand—byte code." to the following The purpose of GCC (the GNU Compiler Collection) is to take Python libraries with C extensions and turn it into something your machine can understand and execute.	Ron B	Mar 02, 2016	Jan 27, 2017
PDF	Page 445 6th paragraph	Based on the instructions given above, ~/Projects has a single sub-directory called data_wrangling. It contains the 'code' subfolder, while 'envs' is in /home/_user's_name_. Note from the Author or Editor: Please update the sentence: At this point, if we look at the contents of our Projects folder, we should have two empty subfolders called code and envs. To read: At this point, we have our code folder set up in a special file inside our Projects folder and our virtual environment folder properly set up in our home directory.	Ron B	Mar 02, 2016	Jan 27, 2017