Errata

Foundations for Analytics with Python

Errata for Foundations for Analytics with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page 21
code snippet

In the code snippet on page 21, there is an extra </match_word> at the end of line 4 that generates a syntax error. (If </match_word> is removed, the code runs as desired.)

In addition, two closing parentheses are missing. The following code works:

# Print the pattern each time it is found in the string
string = "The quick brown fox jumps over the lazy dog."
string_list = string.split()
pattern = re.compile(r"(?P<match_word>The)", re.I)
print("Output #39:")
for word in string_list:
if pattern.search(word):
print("{:s}".format(pattern.search(word).group('match_word')))

Mike Schulte  Oct 13, 2016  Mar 24, 2017
Printed
Page 22
code snippet

In the last line of the code snippet, there are two closing parentheses missing.

Mike Schulte  Oct 13, 2016  Mar 24, 2017
Printed
Page 36
sixth line of code snippet

The sixth line in the code got truncated and should, I believe, read:

ordered_dict1 = sorted(dict_copy.items(), key=lambda item: item[0])

Mike Schulte  Oct 14, 2016  Mar 24, 2017
Printed
Page 38
2nd paragraph, first for loop code snippet

Refer to "Output #126" in printed (page 38) and "Output #129" in code snippet:

The format statement in the first for loop is missing a closing curly bracket:

print("Output #129:")
for month in y:
print("{!s".format(month))



Ted Ensminger  Nov 01, 2016  Mar 24, 2017
Printed
Page 41
code snippet

The code for defining the function getMean does not work unless the "else float('nan')" is on the same line as the rest of the return statement. I suspect the backslash that is generally used to indicate this was accidentally omitted.

Anonymous  Oct 17, 2016  Mar 24, 2017
Printed
Page 43
code snippet

The author seems to have slipped back to python 2 accidentally. The print statements are not correctly written. Code should read:

try:
result = getMean(my_list2)
except ZeroDivisionError as detail:
print("Output #142 (Error): {}".format(float('nan')))
print("Output #142 (Error): {}".format(detail))
else:
print("Output #142 (The mean is): {}".format(result))
finally:
print("Output #142 (Finally): The finally block is executed every time")

Anonymous  Oct 17, 2016  Mar 24, 2017
Printed
Page 45
code snippet

The print statement has Python 2 syntax, not Python 3 syntax. Should read:

print("Output # 143:")
filereader = open(input_file, 'r')
for row in filereader:
print(row.strip())
filereader.close()

Anonymous  Oct 17, 2016  Mar 24, 2017
Printed
Page 53
last line of code snippet

A Python 2 style print. Parentheses are needed:

print("Output #146: Output written to file")

Anonymous  Oct 17, 2016  Mar 24, 2017
Printed
Page 97
snippet code line 8

The output header list column values (page 97, line 8) and the output header list column values in the snippet code file, 10csv_reader_sum...py, do not match:

print: output_header_list = ['file_name', 'total_sales', 'average_sales']
file : output_header_list = ['file_name', 'total_cost', 'average_cost']

Ted Ensminger  Nov 12, 2016  Mar 24, 2017
Printed
Page 99
1st paragraph, code snippet

The pandas snippet code that calculates the sum and mean statistics (page 99) contains two minor bugs that when run from the (Windows) command line will cause Python to throw a 'NameError' exception.

This is the text from the command line after running the script (I modified the script file name for my purposes):

C:\ProjectsPy\Foundations>python pandas_sum_average1.py input_files\ output_file
s\pandas_output7.csv
Traceback (most recent call last):
File "pandas_sum_average1.py", line 22, in <module>
'total_sales': total_sales,
NameError: name 'total_sales' is not defined

C:\ProjectsPy\Foundations>

This is the offending code (bugs):

total_cost = pd.DataFrame([float(str(value).strip('$').replace(',','')) \
for value in data_frame.loc[:, 'Sale Amount']]).sum()

average_cost = pd.DataFrame([float(str(value).strip('$').replace(',','')) \
for value in data_frame.loc[:, 'Sale Amount']]).mean()

data = {'file_name': os.path.basename(input_file),
'total_sales': total_sales,
'average_sales': average_sales}

To fix this snippet, rename the 'total_cost' variable to 'total_sales', and
the 'average_cost' variable to 'average_sales'.

Ted Ensminger  Nov 12, 2016  Mar 24, 2017
Printed
Page 114
Code snippet

The excel snippet code '4excel_value_meets_condition.py' (page 114) contains a major bug that will cause Python to throw a 'TypeError' exception.

Line 20 in the code snippet (page 114) is testing the value in the 'sale_amount' variable for a minimum value of 1400.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas.

You can't compare a string against a floating-point value (1400.0). Take note that I created the input file following the instructions on pages 102 through 103.

This is the command line output after running the script:

C:\ProjectsPy\Foundations>python 4excel_value_meets_condition.py input_files\sal
es_2013.xlsx output_files\4output.xlsx

Traceback (most recent call last):
File "4excel_value_meets_condition.py", line 36, in <module>
if sale_amount > 1400.0:
TypeError: unorderable types: str() > float()

C:\ProjectsPy\Foundations>

This is one solution to correct this problem:

sale = worksheet.cell_value(row_index, sale_amount_column_index)
sale_amount = float(str(sale).strip('$').replace(',',''))

if sale_amount > 1400.0:


This is the command line output after running the corrected script:

C:\ProjectsPy\Foundations>python 4excel_value_meets_condition.py input_files\sal
es_2013.xlsx output_files\4output.xlsx


C:\ProjectsPy\Foundations>

Ted Ensminger  Nov 14, 2016  Mar 24, 2017
Printed
Page 116
Code snippet, Line 9

The pandas snippet code 'pandas_value_meets_condition.py' (page 116) contains a major bug that will cause Python to throw a 'ValueError' exception.

Line 9 in the code snippet (page 116) is testing the value in the 'sale_amount' variable for a minimum value of 1400.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas.

You can't compare a string against a floating-point value (1400.0). Take note that I created the input file following the instructions on pages 102 through 103.

This is the command line output after running the script:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition.py input_files\sales_2013.xlsx output_files\pandas_output8.xlsx

script_name: pandas_value_meets_condition.py
input_file: input_files\sales_2013.xlsx
output_file: output_files\pandas_output8.xlsx

0 $1,200.00
1 $1,425.00
2 $1,390.00
3 $1,257.00
4 $1,725.00
5 $1,995.00
Name: Sale Amount, dtype: object
Traceback (most recent call last):
File "pandas_value_meets_condition.py", line 24, in <module>
data_frame[data_frame['Sale Amount'].astype(float) > 1400.0]
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2
950, in astype
raise_on_error=raise_on_error, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2938, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2890, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
434, in astype
values=values, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
477, in _astype
values = com._astype_nansafe(values.ravel(), dtype, copy=True)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19


20, in _astype_nansafe
return arr.astype(dtype)
ValueError: could not convert string to float: '$1,995.00'

C:\ProjectsPy\Foundations>

This is one solution to correct this problem:

data_frame['Sale Amount'] = data_frame['Sale Amount'].str.replace(r'\$', '')
data_frame['Sale Amount'] = data_frame['Sale Amount'].str.replace(',', '')
data_frame['Sale Amount'] = data_frame['Sale Amount'].astype(float)

data_frame_value_meets_condition = \
data_frame[data_frame['Sale Amount'].astype(float) > 1400.0]


This is the output of the command window after running with the corrected code:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition.py input_files\sal
es_2013.xlsx output_files\pandas_output8.xlsx

script_name: pandas_value_meets_condition.py
input_file: input_files\sales_2013.xlsx
output_file: output_files\pandas_output8.xlsx

0 1200.0
1 1425.0
2 1390.0
3 1257.0
4 1725.0
5 1995.0
Name: Sale Amount, dtype: float64


C:\ProjectsPy\Foundations>

Ted Ensminger  Nov 19, 2016  Mar 24, 2017
Printed
Page 125
Line 24 in the code snippet (page 125), and line 27 in the code snippet file

The base Python snippet code '9excel_value_meets_condition_all_worksheets.py' (page 125) contains a major bug that will cause Python to throw a 'TypeError' exception.

Line 24 in the code snippet (page 125), and line 27 in the code snippet file, is testing the value in the 'sale_amount' variable for a minimum threshold value of 2000.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale_amount column values contain embedded dollar signs and commas.

You can't compare a string ($2,135.00) against a floating-point value (2000.0). Take note that I created the input file following the instructions on pages 102 through 103.

This is the command line output after running the script:

C:\ProjectsPy\Foundations>python 9excel_value_meets_condition_all_worksheets.py
input_files\sales_2013.xlsx output_files\9output.xlsx
Traceback (most recent call last):
File "9excel_value_meets_condition_all_worksheets.py", line 27, in <module>
if sale_amount > threshold:
TypeError: unorderable types: str() > float()

C:\ProjectsPy\Foundations>


This is one solution to correct this problem:

sale_amount = sale_amount.replace(r"$", "")
sale_amount = sale_amount.replace(r",", "")
sale_amount = float(sale_amount)

if sale_amount > threshold:


This is the output of the command window after running with the corrected code:

C:\ProjectsPy\Foundations>python 9excel_value_meets_condition_all_worksheets.py
input_files\sales_2013.xlsx output_files\9output.xlsx


C:\ProjectsPy\Foundations>

Ted Ensminger  Nov 21, 2016  Mar 24, 2017
Printed
Page 126
Line 9 in the code snippet (page 126), and line 9 in the code snippet file

The Pandas snippet code 'pandas_value_meets_condition_all_worksheets.py' (page 126) contains a major bug that will cause Pandas to throw a 'ValueError' exception.

Line 9 in the code snippet (page 126), and line 9 in the code snippet file, is testing the value in the 'Sale Amount' variable for a minimum threshold value of 2000.0. The input file (spreadsheet) contains strings in the 'Sale Amount' column which is column 3. The 'Sale Amount' column values contain embedded dollar signs and commas.

You can't compare a string ($2,280.00) against a floating-point value (2000.0). Take note that I created the input file following the instructions on pages 102 through 103.

This is the offending code (lines 8 and 9):

for worksheet_name, data in data_frame.items():
row_output.append(data[data['Sale Amount'].astype(float) > 2000.0])

This is the command line output after running the script:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition_all_worksheets.py
input_files\sales_2013.xlsx output_files\pandas_output13.xlsx
Traceback (most recent call last):
File "pandas_value_meets_condition_all_worksheets.py", line 9, in <module>
row_output.append(data[data['Sale Amount'].astype(float) > 2000.0])
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2
950, in astype
raise_on_error=raise_on_error, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2938, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2890, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
434, in astype
values=values, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
477, in _astype
values = com._astype_nansafe(values.ravel(), dtype, copy=True)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19
20, in _astype_nansafe
return arr.astype(dtype)
ValueError: could not convert string to float: '$2,280.00'

C:\ProjectsPy\Foundations>

Base Python would throw a 'TypeError' exception when it encounters a type comparision exception. The last line of the Traceback (above) tells you
that Pandas threw a 'ValueError' exception because it 'could not convert string to float', therefore, the comparison test was never reached.

This is one solution to correct this problem:

for worksheet_name, data in data_frame.items():
data['Sale Amount'] = data['Sale Amount'].str.replace(r'\$', '')
data['Sale Amount'] = data['Sale Amount'].str.replace(r',', '')
data['Sale Amount'] = data['Sale Amount'].astype(float)
print(data['Sale Amount']) # Dump 'Sale Amount' values
row_output.append(data[data['Sale Amount'].astype(float) > 2000.0])


This is the output of the command window after running with the corrected code:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition_all_worksheets.py
input_files\sales_2013.xlsx output_files\pandas_output13.xlsx
0 1115.0
1 1367.0
2 2135.0
3 1346.0
4 1560.0
5 1852.0
Name: Sale Amount, dtype: float64
0 1200.0
1 1425.0
2 1390.0
3 1257.0
4 1725.0
5 1995.0
Name: Sale Amount, dtype: float64
0 1350.0
1 1167.0
2 1789.0
3 2042.0
4 1511.0
5 2280.0
Name: Sale Amount, dtype: float64

C:\ProjectsPy\Foundations>

Ted Ensminger  Nov 21, 2016  Mar 24, 2017
Printed
Page 131
Line 11 in the code snippet

The pandas snippet code 'pandas_value_meets_condition_set_of_worksheets.py' (page 131) contains a major bug that will cause Pandas to throw a 'ValueError' exception.

Line 11 in the code snippet (page 131), and line 15 in the code snippet file, is testing the value in the 'sale_amount' column for a minimum 'threshold' value of 1900.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas.

You can't compare a string against a floating-point value (1900.0). Take note that I created the input file following the instructions on pages 102 through 103.

This is the command line output after running the script:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition_set_of_worksheets.
py input_files\sales_2013.xlsx output_files\pandas_output15.xlsx
Traceback (most recent call last):
File "pandas_value_meets_condition_set_of_worksheets.py", line 15, in <module>

row_list.append(data[data['Sale Amount'].astype(float) > threshold])
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2
950, in astype
raise_on_error=raise_on_error, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2938, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
2890, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
434, in astype
values=values, **kwargs)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line
477, in _astype
values = com._astype_nansafe(values.ravel(), dtype, copy=True)
File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19
20, in _astype_nansafe
return arr.astype(dtype)
ValueError: could not convert string to float: '$1,995.00'

C:\ProjectsPy\Foundations>


This is one solution to correct this problem (add code after line 10 in print snippet and line 14 in the code file).
I added two print statements to print the column's type (dtype). They will show the dtype before and after the
data values are edited.:

print("data['Sale Amount'].dtype - before: ", (data['Sale Amount'].dtype)) # Print dtype
data['Sale Amount'] = data['Sale Amount'].str.replace(r'\$', '')
data['Sale Amount'] = data['Sale Amount'].str.replace(',', '')
print("data['Sale Amount'].dtype - after: ", (data['Sale Amount'].dtype)) # Print dtype


This is the output of the command window after running with the corrected code:

C:\ProjectsPy\Foundations>python pandas_value_meets_condition_set_of_worksheets.
py input_files\sales_2013.xlsx output_files\pandas_output15.xlsx
data['Sale Amount'].dtype - before: object
data['Sale Amount'].dtype - after: float64
data['Sale Amount'].dtype - before: object
data['Sale Amount'].dtype - after: float64

C:\ProjectsPy\Foundations>

Ted Ensminger  Dec 01, 2016  Mar 24, 2017
Printed
Page 136
Line 15 in the code snippet

The Base Python snippet code '13excel_concat_data_from_multiple_workbooks.py' (page 136) contains an improperly formatted print statement that will cause Python to throw a 'SyntaxError' exception.

Line 15 in the code snippet (page 136), and line 18 in the code snippet file, has a print statement that does not contain the opening and closing parentheses.

This is the offending code:

print os.path.basename(input_file)

This is the corrected code:

print (os.path.basename(input_file))

Ted Ensminger  Dec 04, 2016  Mar 24, 2017