Errata
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted By | Date submitted | Date corrected |
---|---|---|---|---|---|
I 1st paragraph - Running the pipeline in the cloud |
In Chapter 4 the section "Running the pipeline in the cloud" I ran the program df06.py with specified arguments but received the following error with a stack trace Note from the Author or Editor: |
Anonymous | Mar 29, 2018 | Oct 25, 2019 | |
I Running the Pipeline in the Cloud |
In the section Running the Pipeline in the Cloud, it mentions DataflowPipelineRunner but I think this should be DataflowRunner. Also the output after running the program indicates it ran successfully but the simevents table is not created. Note from the Author or Editor: |
Asish Patel | Mar 29, 2018 | Oct 25, 2019 | |
Printed | Page 61 right before Summary |
I think your estimate of costs is misleadingly off by a few orders of magnitude. Because you need to use the flex instead of standard version of App Engine, it cannot currently scale down to zero, and you will be paying for at least one instance continuously (your manual scaling setting also says you will be running one instance at all times). Using at least one instance all the time costs a lot more than using it for 10 minutes a month. You can probably find many Internet discussions about this common billing surprise for the Flex environment. We all wish that flex could scale down to zero instances like standard, but so far as I know GCP cannot yet support it, and if it were supported you would need to change your scaling settings in order to get that behavior. Note from the Author or Editor: |
Ed Barton | Feb 15, 2018 | Oct 25, 2019 |
Printed | Page 104 2nd paragraph |
Please change the sentence from: |
Valliappa Lakshmanan |
Apr 03, 2018 | Oct 25, 2019 |
Printed | Page 105 2nd paragraph - but primarily in file df02.py |
I run the "install_packes.sh" script. Then when running df02.py from the command line in the GCS console I get the following error: |
Anonymous | Mar 23, 2019 | Oct 25, 2019 |
Printed | Page 157 SQL |
Missing >= in WHERE clause: DEP_DELAY>=10. Code sample in github is correct. Note from the Author or Editor: |
Michael Shearer | Mar 04, 2018 | Oct 25, 2019 |
Printed | Page 202 4th para. |
The helpful footnote 17 re: quotas on page 240 could be moved to page 202 when cluster is first resized and quota limits encountered. Note from the Author or Editor: |
Michael Shearer | Mar 04, 2018 | Oct 25, 2019 |
Printed | Page 218 2nd para |
Refers to s0 not x0 Note from the Author or Editor: |
Michael Shearer | Mar 11, 2018 | Oct 25, 2019 |
Printed | Page 240 last para. |
Post Dataproc 1.2 HDFS Web Interface Port 50070 has been replaced by Port 9870. See https://cloud.google.com/dataproc/docs/concepts/accessing/cluster-web-interfaces Note from the Author or Editor: |
Michael Shearer | Mar 11, 2018 | Oct 25, 2019 |
Printed | Page 277 Maven launch |
Maven args should include --fullDataset=true and --project= should be specified. Note from the Author or Editor: |
Michael Shearer | Mar 31, 2018 | Oct 25, 2019 |
Printed | Page 344 |
Need to specify --project when invoking simulate.py Note from the Author or Editor: |
Michael Shearer | Apr 21, 2018 | Oct 25, 2019 |