Errata

Errata for Foundations for Analytics with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed	Page 21 code snippet	In the code snippet on page 21, there is an extra </match_word> at the end of line 4 that generates a syntax error. (If </match_word> is removed, the code runs as desired.) In addition, two closing parentheses are missing. The following code works: # Print the pattern each time it is found in the string string = "The quick brown fox jumps over the lazy dog." string_list = string.split() pattern = re.compile(r"(?P<match_word>The)", re.I) print("Output #39:") for word in string_list: if pattern.search(word): print("{:s}".format(pattern.search(word).group('match_word')))	Mike Schulte	Oct 13, 2016	Mar 24, 2017
Printed	Page 22 code snippet	In the last line of the code snippet, there are two closing parentheses missing.	Mike Schulte	Oct 13, 2016	Mar 24, 2017
Printed	Page 36 sixth line of code snippet	The sixth line in the code got truncated and should, I believe, read: ordered_dict1 = sorted(dict_copy.items(), key=lambda item: item[0])	Mike Schulte	Oct 14, 2016	Mar 24, 2017
Printed	Page 38 2nd paragraph, first for loop code snippet	Refer to "Output #126" in printed (page 38) and "Output #129" in code snippet: The format statement in the first for loop is missing a closing curly bracket: print("Output #129:") for month in y: print("{!s".format(month))	Ted Ensminger	Nov 01, 2016	Mar 24, 2017
Printed	Page 41 code snippet	The code for defining the function getMean does not work unless the "else float('nan')" is on the same line as the rest of the return statement. I suspect the backslash that is generally used to indicate this was accidentally omitted.	Anonymous	Oct 17, 2016	Mar 24, 2017
Printed	Page 43 code snippet	The author seems to have slipped back to python 2 accidentally. The print statements are not correctly written. Code should read: try: result = getMean(my_list2) except ZeroDivisionError as detail: print("Output #142 (Error): {}".format(float('nan'))) print("Output #142 (Error): {}".format(detail)) else: print("Output #142 (The mean is): {}".format(result)) finally: print("Output #142 (Finally): The finally block is executed every time")	Anonymous	Oct 17, 2016	Mar 24, 2017
Printed	Page 45 code snippet	The print statement has Python 2 syntax, not Python 3 syntax. Should read: print("Output # 143:") filereader = open(input_file, 'r') for row in filereader: print(row.strip()) filereader.close()	Anonymous	Oct 17, 2016	Mar 24, 2017
Printed	Page 53 last line of code snippet	A Python 2 style print. Parentheses are needed: print("Output #146: Output written to file")	Anonymous	Oct 17, 2016	Mar 24, 2017
Printed	Page 97 snippet code line 8	The output header list column values (page 97, line 8) and the output header list column values in the snippet code file, 10csv_reader_sum...py, do not match: print: output_header_list = ['file_name', 'total_sales', 'average_sales'] file : output_header_list = ['file_name', 'total_cost', 'average_cost']	Ted Ensminger	Nov 12, 2016	Mar 24, 2017
Printed	Page 99 1st paragraph, code snippet	The pandas snippet code that calculates the sum and mean statistics (page 99) contains two minor bugs that when run from the (Windows) command line will cause Python to throw a 'NameError' exception. This is the text from the command line after running the script (I modified the script file name for my purposes): C:\ProjectsPy\Foundations>python pandas_sum_average1.py input_files\ output_file s\pandas_output7.csv Traceback (most recent call last): File "pandas_sum_average1.py", line 22, in <module> 'total_sales': total_sales, NameError: name 'total_sales' is not defined C:\ProjectsPy\Foundations> This is the offending code (bugs): total_cost = pd.DataFrame([float(str(value).strip('$').replace(',','')) \ for value in data_frame.loc[:, 'Sale Amount']]).sum() average_cost = pd.DataFrame([float(str(value).strip('$').replace(',','')) \ for value in data_frame.loc[:, 'Sale Amount']]).mean() data = {'file_name': os.path.basename(input_file), 'total_sales': total_sales, 'average_sales': average_sales} To fix this snippet, rename the 'total_cost' variable to 'total_sales', and the 'average_cost' variable to 'average_sales'.	Ted Ensminger	Nov 12, 2016	Mar 24, 2017
Printed	Page 114 Code snippet	The excel snippet code '4excel_value_meets_condition.py' (page 114) contains a major bug that will cause Python to throw a 'TypeError' exception. Line 20 in the code snippet (page 114) is testing the value in the 'sale_amount' variable for a minimum value of 1400.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas. You can't compare a string against a floating-point value (1400.0). Take note that I created the input file following the instructions on pages 102 through 103. This is the command line output after running the script: C:\ProjectsPy\Foundations>python 4excel_value_meets_condition.py input_files\sal es_2013.xlsx output_files\4output.xlsx Traceback (most recent call last): File "4excel_value_meets_condition.py", line 36, in <module> if sale_amount > 1400.0: TypeError: unorderable types: str() > float() C:\ProjectsPy\Foundations> This is one solution to correct this problem: sale = worksheet.cell_value(row_index, sale_amount_column_index) sale_amount = float(str(sale).strip('$').replace(',','')) if sale_amount > 1400.0: This is the command line output after running the corrected script: C:\ProjectsPy\Foundations>python 4excel_value_meets_condition.py input_files\sal es_2013.xlsx output_files\4output.xlsx C:\ProjectsPy\Foundations>	Ted Ensminger	Nov 14, 2016	Mar 24, 2017
Printed	Page 116 Code snippet, Line 9	The pandas snippet code 'pandas_value_meets_condition.py' (page 116) contains a major bug that will cause Python to throw a 'ValueError' exception. Line 9 in the code snippet (page 116) is testing the value in the 'sale_amount' variable for a minimum value of 1400.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas. You can't compare a string against a floating-point value (1400.0). Take note that I created the input file following the instructions on pages 102 through 103. This is the command line output after running the script: C:\ProjectsPy\Foundations>python pandas_value_meets_condition.py input_files\sales_2013.xlsx output_files\pandas_output8.xlsx script_name: pandas_value_meets_condition.py input_file: input_files\sales_2013.xlsx output_file: output_files\pandas_output8.xlsx 0 $1,200.00 1 $1,425.00 2 $1,390.00 3 $1,257.00 4 $1,725.00 5 $1,995.00 Name: Sale Amount, dtype: object Traceback (most recent call last): File "pandas_value_meets_condition.py", line 24, in <module> data_frame[data_frame['Sale Amount'].astype(float) > 1400.0] File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2 950, in astype raise_on_error=raise_on_error, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2938, in astype return self.apply('astype', dtype=dtype, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2890, in apply applied = getattr(b, f)(kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 434, in astype values=values, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 477, in _astype values = com._astype_nansafe(values.ravel(), dtype, copy=True) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19 20, in _astype_nansafe return arr.astype(dtype) ValueError: could not convert string to float: '$1,995.00' C:\ProjectsPy\Foundations> This is one solution to correct this problem: data_frame['Sale Amount'] = data_frame['Sale Amount'].str.replace(r'\$', '') data_frame['Sale Amount'] = data_frame['Sale Amount'].str.replace(',', '') data_frame['Sale Amount'] = data_frame['Sale Amount'].astype(float) data_frame_value_meets_condition = \ data_frame[data_frame['Sale Amount'].astype(float) > 1400.0] This is the output of the command window after running with the corrected code: C:\ProjectsPy\Foundations>python pandas_value_meets_condition.py input_files\sal es_2013.xlsx output_files\pandas_output8.xlsx script_name: pandas_value_meets_condition.py input_file: input_files\sales_2013.xlsx output_file: output_files\pandas_output8.xlsx 0 1200.0 1 1425.0 2 1390.0 3 1257.0 4 1725.0 5 1995.0 Name: Sale Amount, dtype: float64 C:\ProjectsPy\Foundations>	Ted Ensminger	Nov 19, 2016	Mar 24, 2017
Printed	Page 125 Line 24 in the code snippet (page 125), and line 27 in the code snippet file	The base Python snippet code '9excel_value_meets_condition_all_worksheets.py' (page 125) contains a major bug that will cause Python to throw a 'TypeError' exception. Line 24 in the code snippet (page 125), and line 27 in the code snippet file, is testing the value in the 'sale_amount' variable for a minimum threshold value of 2000.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale_amount column values contain embedded dollar signs and commas. You can't compare a string ($2,135.00) against a floating-point value (2000.0). Take note that I created the input file following the instructions on pages 102 through 103. This is the command line output after running the script: C:\ProjectsPy\Foundations>python 9excel_value_meets_condition_all_worksheets.py input_files\sales_2013.xlsx output_files\9output.xlsx Traceback (most recent call last): File "9excel_value_meets_condition_all_worksheets.py", line 27, in <module> if sale_amount > threshold: TypeError: unorderable types: str() > float() C:\ProjectsPy\Foundations> This is one solution to correct this problem: sale_amount = sale_amount.replace(r"$", "") sale_amount = sale_amount.replace(r",", "") sale_amount = float(sale_amount) if sale_amount > threshold: This is the output of the command window after running with the corrected code: C:\ProjectsPy\Foundations>python 9excel_value_meets_condition_all_worksheets.py input_files\sales_2013.xlsx output_files\9output.xlsx C:\ProjectsPy\Foundations>	Ted Ensminger	Nov 21, 2016	Mar 24, 2017
Printed	Page 126 Line 9 in the code snippet (page 126), and line 9 in the code snippet file	The Pandas snippet code 'pandas_value_meets_condition_all_worksheets.py' (page 126) contains a major bug that will cause Pandas to throw a 'ValueError' exception. Line 9 in the code snippet (page 126), and line 9 in the code snippet file, is testing the value in the 'Sale Amount' variable for a minimum threshold value of 2000.0. The input file (spreadsheet) contains strings in the 'Sale Amount' column which is column 3. The 'Sale Amount' column values contain embedded dollar signs and commas. You can't compare a string ($2,280.00) against a floating-point value (2000.0). Take note that I created the input file following the instructions on pages 102 through 103. This is the offending code (lines 8 and 9): for worksheet_name, data in data_frame.items(): row_output.append(data[data['Sale Amount'].astype(float) > 2000.0]) This is the command line output after running the script: C:\ProjectsPy\Foundations>python pandas_value_meets_condition_all_worksheets.py input_files\sales_2013.xlsx output_files\pandas_output13.xlsx Traceback (most recent call last): File "pandas_value_meets_condition_all_worksheets.py", line 9, in <module> row_output.append(data[data['Sale Amount'].astype(float) > 2000.0]) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2 950, in astype raise_on_error=raise_on_error, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2938, in astype return self.apply('astype', dtype=dtype, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2890, in apply applied = getattr(b, f)(kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 434, in astype values=values, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 477, in _astype values = com._astype_nansafe(values.ravel(), dtype, copy=True) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19 20, in _astype_nansafe return arr.astype(dtype) ValueError: could not convert string to float: '$2,280.00' C:\ProjectsPy\Foundations> Base Python would throw a 'TypeError' exception when it encounters a type comparision exception. The last line of the Traceback (above) tells you that Pandas threw a 'ValueError' exception because it 'could not convert string to float', therefore, the comparison test was never reached. This is one solution to correct this problem: for worksheet_name, data in data_frame.items(): data['Sale Amount'] = data['Sale Amount'].str.replace(r'\$', '') data['Sale Amount'] = data['Sale Amount'].str.replace(r',', '') data['Sale Amount'] = data['Sale Amount'].astype(float) print(data['Sale Amount']) # Dump 'Sale Amount' values row_output.append(data[data['Sale Amount'].astype(float) > 2000.0]) This is the output of the command window after running with the corrected code: C:\ProjectsPy\Foundations>python pandas_value_meets_condition_all_worksheets.py input_files\sales_2013.xlsx output_files\pandas_output13.xlsx 0 1115.0 1 1367.0 2 2135.0 3 1346.0 4 1560.0 5 1852.0 Name: Sale Amount, dtype: float64 0 1200.0 1 1425.0 2 1390.0 3 1257.0 4 1725.0 5 1995.0 Name: Sale Amount, dtype: float64 0 1350.0 1 1167.0 2 1789.0 3 2042.0 4 1511.0 5 2280.0 Name: Sale Amount, dtype: float64 C:\ProjectsPy\Foundations>	Ted Ensminger	Nov 21, 2016	Mar 24, 2017
Printed	Page 131 Line 11 in the code snippet	The pandas snippet code 'pandas_value_meets_condition_set_of_worksheets.py' (page 131) contains a major bug that will cause Pandas to throw a 'ValueError' exception. Line 11 in the code snippet (page 131), and line 15 in the code snippet file, is testing the value in the 'sale_amount' column for a minimum 'threshold' value of 1900.0. The input file (spreadsheet) contains strings in the sale_amount column which is column 3. The sale amounts contain embedded dollar signs and commas. You can't compare a string against a floating-point value (1900.0). Take note that I created the input file following the instructions on pages 102 through 103. This is the command line output after running the script: C:\ProjectsPy\Foundations>python pandas_value_meets_condition_set_of_worksheets. py input_files\sales_2013.xlsx output_files\pandas_output15.xlsx Traceback (most recent call last): File "pandas_value_meets_condition_set_of_worksheets.py", line 15, in <module> row_list.append(data[data['Sale Amount'].astype(float) > threshold]) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2 950, in astype raise_on_error=raise_on_error, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2938, in astype return self.apply('astype', dtype=dtype, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2890, in apply applied = getattr(b, f)(kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 434, in astype values=values, kwargs) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\internals.py", line 477, in _astype values = com._astype_nansafe(values.ravel(), dtype, copy=True) File "C:\Users\Ted\Anaconda3\lib\site-packages\pandas\core\common.py", line 19 20, in _astype_nansafe return arr.astype(dtype) ValueError: could not convert string to float: '$1,995.00' C:\ProjectsPy\Foundations> This is one solution to correct this problem (add code after line 10 in print snippet and line 14 in the code file). I added two print statements to print the column's type (dtype). They will show the dtype before and after the data values are edited.: print("data['Sale Amount'].dtype - before: ", (data['Sale Amount'].dtype)) # Print dtype data['Sale Amount'] = data['Sale Amount'].str.replace(r'\$', '') data['Sale Amount'] = data['Sale Amount'].str.replace(',', '') print("data['Sale Amount'].dtype - after: ", (data['Sale Amount'].dtype)) # Print dtype This is the output of the command window after running with the corrected code: C:\ProjectsPy\Foundations>python pandas_value_meets_condition_set_of_worksheets. py input_files\sales_2013.xlsx output_files\pandas_output15.xlsx data['Sale Amount'].dtype - before: object data['Sale Amount'].dtype - after: float64 data['Sale Amount'].dtype - before: object data['Sale Amount'].dtype - after: float64 C:\ProjectsPy\Foundations>	Ted Ensminger	Dec 01, 2016	Mar 24, 2017
Printed	Page 136 Line 15 in the code snippet	The Base Python snippet code '13excel_concat_data_from_multiple_workbooks.py' (page 136) contains an improperly formatted print statement that will cause Python to throw a 'SyntaxError' exception. Line 15 in the code snippet (page 136), and line 18 in the code snippet file, has a print statement that does not contain the opening and closing parentheses. This is the offending code: print os.path.basename(input_file) This is the corrected code: print (os.path.basename(input_file))	Ted Ensminger	Dec 04, 2016	Mar 24, 2017