Errata

Agile Data Science 2.0

Errata for Agile Data Science 2.0

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page Page 34
Paragraph 2

Page 34 paragraph 2 reads, "Once you have installed the AWS CLI, check out ec2.sh (https://github.com/rjurney/Agile_Data_Code_2/blob/master/ec2.sh)."

This directory on Github does not exist, and nowhere in the entire directory for this book does the file exist.

There are some vague instructions for now using Jupyter notebook for the code. But like I said, very vague and certainly not explained to any extent. I posted a question in the Github issues tab for this book's code in rjurney 3 days ago and still waiting for reply. Fairly obvious from other issues posts that I won't be getting a response. It sounds like rjurney has moved on to another project, which is fine. But someone at O'Reilly or rjurney should at least post an alternative path.

I'm very disappointed I spent money on this book. Will be hoping to return it to Amazon.

Kurt Wolter  Jul 20, 2022 
Printed, Page 63
Fourth paragraph, "Next, we need.."

The bash command is printed incorrectly as:
ln -s $PROJECT_HOME/ch02/airflow_setup.py ~/airflow/dags/
Should be:
ln -s $PROJECT_HOME/ch02/airflow_test.py ~/airflow/dags/airflow_test.py

Ray Brown  Jul 20, 2017 
Printed Page 72
output from Python run of test_pymongo.py is shown in Python 2.x, author is using 3.5 for text

The printed output appears to be from an earlier printing of the text and from the execution of the ch02/test_pymongo.py script by Python 2.x with the "u" prefix on the strings.

Ray Brown  Jul 20, 2017 
Printed Page 72
Second paragraph, "Displaying executives..."

Original text "...Mongo using Pig and MongoStorage. Run..." should be "Mongo using PySpark and pymongo_spark. Run..."

Ray  Jul 20, 2017 
Printed, Page 91
First paragraph

Commands to download data are duplicated from pages 79-80 to page 91.
# Get openflights data
wget -P /tmp/ \
https://raw.githubusercontent.com/jpatokal/openflights/ \
master/data/airports.dat
mv /tmp/airports.dat data/airports.csv

wget -P /tmp/ \
https://raw.githubusercontent.com/jpatokal/openflights/ \
master/data/airlines.dat
mv /tmp/airlines.dat data/airlines.csv

wget -P /tmp/ \
https://raw.githubusercontent.com/jpatokal/openflights/ \
master/data/routes.dat
mv /tmp/routes.dat data/routes.csv

wget -P /tmp/ \
https://raw.githubusercontent.com/jpatokal/openflights/ \
master/data/countries.dat
mv /tmp/countries.dat data/countries.csv

Ray Brown  Jul 20, 2017 
Printed, Page 93
Second to the last python line on page

The python line: "on_time_dataframe = spark.read.parquet('data/trimmed_cast_performance.parquet')" should read "on_time_dataframe = spark.read.parquet('data/on_time_performance.parquet')".

Ray Brown  Jul 21, 2017 
Printed, Page 96
Second paragraph momgo query line

Line reads: "db.on_time_performance.findOne(
{Carrier: 'DL', FlightDate: '2015-01-01', FlightNum: 478})", it should read "db.on_time_performance.findOne(
{Carrier: 'DL', FlightDate: '2015-01-01', FlightNum: '478'})". The FlightNum field is a 'string' type.

Ray Brown  Jul 21, 2017 
Printed, , Other Digital Version Page 98
First full paragraph, python code

The line: "'FlightNum': int(flight_num)" sould be: "'FlightNum': flight_num".

The source file ch04/web/on_time_flask_template.py needs modification as well.

Ray Brown  Jul 21, 2017 
Other Digital Version 98
Source code for ch04/web/on_time_flask.py

Source code for ch04/web/on_time_flask.py line:
'FlightNum': int(flight_num)
should be:
'FlightNum': flight_num

Ray Brown  Jul 21, 2017 
Printed Page 103
Second paragraph momgo query line

The mongo query line:
db.on_time_performance.find(
{Origin: 'ATL', Dest: 'SFO', FlightDate: '2015-01-01'}).sort(
{DepTime: 1, ArrTime: 1}) // Slow or broken
is bold for " db.on_time_" portion and not the remaining query.

Ray Brown  Jul 21, 2017 
Other Digital Version 104
In the source listing for ch04/web/on_time_flask_template.py

Trying to execute the original source code ch04/web/on_time_flask_template.py (20-JUL-2017) causes an error that notes the "display_nav" function allows only 3 parameters. The ch04/web/templates/macros.jnj file in the source code needs correction. The parameter list in the declaration of the display_nave function in the macros.jnj file needs to include the "query" parameter. So the edit is to the macros.jnj file from:
...
2 {% macro display_nav(offsets, path, count) -%}
...
to:
...
2 {% macro display_nav(offsets, path, count, query) -%}
...

The other chapters, 6, 8, and 10 probably should fix their macros.jnj files as well.

Ray Brown  Jul 27, 2017 
ePub Page 262
-

Histograms are not bar charts...

Especially when working with variable bin width, the height of the rectangles must not be the count.

You can clearly see an abnormal height when the width is increased.

Actually, the count represents the area of each rectangle, meaning that the height should be count / width instead of count.

You can think of it as real buckets with balls, as you widen a bucket, the height of its ball stack decreases.

Cédric Pelvet  Jun 22, 2017 
Printed, Page 325
Index entry for DAGS

The index entry for reads "DAGS (directed acrylic graphs)". It should be "DAGS (directed acyclic graphs).

Ray Brown  Jul 21, 2017 
Printed, Page 326
Index entry "directed acrylic graphs"

Index entry for "directed acyclic graphs" reads "directed acrylic graphs" incorrectly.

Ray Brown  Jul 21, 2017