Errata

Fundamentals of Data Engineering

Errata for Fundamentals of Data Engineering

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page Acknowlegments
technical reviewers paragraph

Tod Hanseman should be spelled Tod Hansmann

Joe Reis
 
Jul 22, 2022  Jul 28, 2023
Page section "lambda Architecture" page 104
figure 3-14

The figure 3-14 illustrating the lambda architecture doesn't illustrate what is described in the paragraph above it.
the author says : "In a Lambda architecture (Figure 3-14), you have systems operating independently of each other—batch, streaming, and serving."
In the figure we have 2 streaming systems (the batch system is not shown) and a serving system.

Note from the Author or Editor:
The bottom box that says "stream processing" that attaches to "batch processing" should say "batch processing".

Anonymous  Nov 15, 2022  Jul 28, 2023
Page Acknowledgments - page xix
Upper section

Lior Gavish is mentioned twice

Note from the Author or Editor:
Please remove the second reference to Lior Gavish in acknowlegements

Igal drayerman  Feb 10, 2023  Jul 28, 2023
Printed
Page page 241, or in the Frequency sub section
middle image

Figure 7-4 shows ingestion frequencies of data in batch, micro batch, and real time. The sub headings of frequent and semi-frequent are in the wrong order.

It should be

batch = semi-frequent
micro-batch = frequent

Joe Reis
 
Jul 31, 2023 
Page Page Number: 273
Section - Data Definition Language, 2nd Paragraph

On Page Number 273, within the Data Definition Language section, there it is mentioned in para2 that classifies "UPDATE" as a DDL expression. However, it should be noted that "UPDATE" is typically considered as a DML expression.

Note from the Author or Editor:
Thanks for spotting this error.

Divyansh Jain  Nov 16, 2023 
Page p.307
l. -3.

The line
"That's it! Now let’s look at ways to view data contextually using satellites.'
does not seem to fit in this place

The line should be just below the table 8-18, and above the 'Satelites' paragraph.

Note from the Author or Editor:
This might read better if we move the "That's it! Now let’s look at ways to view data contextually using satellites.' sentence to the end of the Link section, after the sentence that says "Note that we're...". This is the sentence right before the satellite portion begins.

HIDEMOTO NAKADA  Jan 14, 2024 
Page 10
Figure 1-3

Figure 1-3 is quite blurry in both the print and Kindle editions of the book

Note from the Author or Editor:
On the one hand, this is somewhat deliberate to emphasize the sheer number of tools in each diagram. On the other hand, it would be nice to at least make the left image with fewer tools more clear. There are higher resolution images available on Matt Turck's website, and I could also email Matt to request original files.

Sergio Ramos-Valverde  Nov 12, 2024 
Page 49
Figure 2-7

Figure 2-7 under DataOps, the first item should be Automation, not Data Governance.
This would bring figure 2-7 in line with the items in figure 2-8.

Note from the Author or Editor:
Replace "Data Governance" with "Automation" in the diagram.

Sky Quintin  May 08, 2024 
Page 101
Final paragraph into the beginning of 102

In page 101 under figure 3-10, the authors mention the acronym "MPP". Later on 102 they mention "CDC", without defining either term. Would it be possible to include the definitions of these terms in a future edition?

Note from the Author or Editor:
We define each term the first time it is used, and both acronyms appear in the index. Is this adequate, or should we consider other changes?

Sergio Ramos-Valverde  Nov 27, 2024 
Page 103
Last paragraph

Acronym "HDFS" used without definition. Is it possible to include what it stand for?

Note from the Author or Editor:
We define HDFS the first time we use the acronym. I don't want to be too repetitive by spelling this out every time. One possible solution is to put HDFS in the index. (Right now, we have an entry for "Hadoop Distributed File System," but not for HDFS.)

Sergio Ramos-Valverde  Nov 27, 2024 
Page 168
2nd paragraph

"...which we discuss at greater length in 'Messages and Streams' on page 167.)" should probably read, "...which we discuss at greater length in 'Message Queues and Event-Streaming Platforms' on page 259.)". In its current form, this is a self-referential breadcrumb, and the preceding paragraphs in the section do not "discuss at greater length," whereas the aforementioned section starting on page 259 does go into more detail. This is a particularly confusing typo due to the section name. Indeed, I did not understand the parenthetical until 100 pages later!

Note from the Author or Editor:
O'Reilly - can we fix this? Thanks.

Adam Shamlian  Nov 16, 2022  Jul 28, 2023
Page 170
Paragraph 4, "Lookups"

"Understand how to leverage for efficient extraction." Feels like it should read "Understand how to leverage them for efficient extraction." or "Understand how to leverage indexes for efficient extraction."

Note from the Author or Editor:
Fixed in Atlas branch mlhousley

Sergio Ramos-Valverde  Dec 05, 2024 
Page 174
Bottom of page

The JSON object printed at the bottom of this page is not formatted properly for some of the nested data. It makes reading and interpreting what this data represents quite difficult.

The two lines after the lines starting with "name" ("first" and "last") should have two additional leading spaces. Same for four lines after "favorite_bands".

Joe Reis, co-author, sent me to this link after we discussed this on LinkedIn. I would be more than happy to volunteer to help out with helping fix formatting.

Note from the Author or Editor:
O'Reilly - can we better format this? Thanks.

Brian Armstrong  Dec 12, 2022  Jul 28, 2023
Page 182
3rd paragraph

"from routing messages between microservices ingesting millions of events per second of event data from web, mobile, and IoT applications." This feels like there should be to "to" somewhere in this sentence

Note from the Author or Editor:
Fixed in Atlas on branch mlhousley.

Sergio Ramos-Valverde  Dec 09, 2024 
Page 184
3rd page paragraph; 1st paragraph in subsection "Topics."

The last sentence of the paragraph reads, "A topic can have zero, one, or multiple producers and customers on most event-streaming platforms."

It should probably read, "A topic can have zero, one, or multiple producers and consumers on most event-streaming platforms."

So, "consumers", not "customers ".

Note from the Author or Editor:
Confirmed. Please correct.

L. D. Nicolas May  Aug 27, 2023 
Page 184
"Topics" paragraph

"A topic can have zero, one, or multiple producers and customers on most event-streaming platforms."

Shouldn't customers be consumers ?

"A topic can have zero, one, or multiple producers and consumers on most event-streaming platforms."

Sergio Ramos-Valverde  Dec 09, 2024 
Page 188
second to last paragraph

"...-should extend up and down the entire stack and support the data engineering and lifecycle."

Not sure the last and is supposed to be there?

Note from the Author or Editor:
Delete the last "and"

Sergio Ramos-Valverde  Dec 09, 2024 
Page 219
First paragraph

In the first paragraph there is a reference to a figure that reads (see Figure 6-3).

It should reference Figure 6-2.

Note from the Author or Editor:
Confirmed

Mike Porter  Sep 03, 2023 
Page 229
1st paragraph

"The storage price goes down from faster/higher performing storage to lower storage"

Shouldn't this be:

"The storage price goes down from faster/higher performing storage to slower storage"

Sergio Ramos-Valverde  Dec 17, 2024 
Page 234
"Software Engineering" paragraph

"Make sure the code you write stores the data correctly and doesn't accidentally cause data, memory leaks, or performance issues."

What is meant by "...cause data"?

Note from the Author or Editor:
Edit: "Make sure that your code stores the data correctly and doesn't accidentally cause memory leaks or performance issues."

Sergio Ramos-Valverde  Dec 17, 2024 
Page 243
Paragraph 2

"The big idea is that rather than relying on asynchronous processing, where a batch process runs for each stage as the input batch closes and certain time conditions are met, each stage of the asynchronous pipeline can process data items as they become available in parallel across the Beam cluster"

Shouldn't the first "asynchronous" be "synchronous" ?

Sergio Ramos-Valverde  Dec 18, 2024 
Printed
Page 287
Not sure, got feedback from someone

Page 287 "if new events arrive for the use" should be user

Joe Reis
 
Feb 11, 2023  Jul 28, 2023