Errata

Deciphering Data Architectures

Errata for Deciphering Data Architectures

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF
Page p. 126 of pdf
1st paragraph

Fourth, ELT performs transformations one record at a time, which can also be slow.

should be:

Fourth, ETL performs transformations one record at a time, which can also be slow.

James Serra
 
Feb 27, 2024  Mar 29, 2024
PDF
Page p.156 of PDF
last sentence of last paragraph

a data fabric’s advanced governance capabilities help can maintain compliance no matter where the data is.

should be:

a data fabric’s advanced governance capabilities can help maintain compliance no matter where the data is.

James Serra
 
Feb 27, 2024  Mar 29, 2024
PDF
Page p.139 of PDF
Under Step 5

business users can analyze it using familiar tools, such as reports and dashboards.

should be:

business users can analyze it using familiar tools, to create such things as reports and dashboards.

James Serra
 
Feb 28, 2024  Mar 29, 2024
Page Chapter 12 - The Data Lakehouse Architecture
Figure 12-2

Figure 12-2 shows an RDW in a Data Lakehouse Architecture

Note from the Author or Editor:
Figure 12-2 needs to be fixed: "RDW" should be replaced with "Relational Serving Layer (optional)". "Schema on write" should be replaced with "Schema on read". The two icons in the "Schema on write" box, the document and star, should be removed.

Jaap Vermeer  Mar 05, 2024  Mar 29, 2024
Printed
Page back cover
On the bullet list

Add this bullet point at the end:
- Free from product discussions, this book is a timeless resource for years to come

James Serra
 
Mar 22, 2024 
Page Page 1 of chapter 1 (Big Data)
End of paragraph 2

At the end of paragraph 2: "It had to be scraped and restarted from scratch."

"Scraped" should be "scrapped".

This is from the Amazon Kindle edition of the book.

William Strausser  Aug 17, 2024 
Page page 8
Figure 1-3

manged should be managed in Stage 2

Andy South  Sep 02, 2024 
Page II. Common Data Architecture Concepts - 8. Dimensional Modeling: Tracking Changes
Type 3

Type 3 SCD is described as follows in the book: Type 3 SCDs create a new record for each change so you can maintain a complete history of the data. This type of SCD is the most complex but also the most flexible. For example, when a customer moves from one US state to another, a Type 3 SCD would store the entire old record and an entire new record, which could include dozens of fields and not just the US state field.

However, it is commonly accepted that Type 3 SCD adds a new field/attribute/column to keep track of historical changes, instead of adding new records as described in the book. A type 3 SCD will add a new column to keep track of the latest change of an attribute only. In contrast, a type 2 SCD will add new records to keep track of the entire history of a dimension.

Note from the Author or Editor:
What it should read:

Type 2
Type 2 SCDs maintain multiple versions of the data: the new data and a record of the old data. Use this type of SCD when you need to track the changes to the data over time and maintain a record of the old data. For example, let’s say that six months ago, a company analyzed its sales data and found New York to be its top-selling state. Now, if some customers have since moved out of New York to New Jersey, rerunning the same report without accounting for these changes would show inaccurately lower sales figures for New York. This would lead to a mistaken perception of historical data showing declining sales in New York, which could influence strategic decisions. So, if a customer moves from one US state to another, the company’s Type 2 SCD would create a new record to store both the old state and the new one.

Type 3
Type 3 SCDs add a new column for each change so you can maintain a limited history of the data. This type of SCD is less complex but also more limited in scope. For example, when a customer moves from one US state to another, a Type 3 SCD would add a new column to store the previous state, while the current state is updated in the existing field.

Arnould Monteyne  Sep 11, 2024 
Page III. Data Architectures - 13. Data Mesh Foundation
Data Mesh Hype

The link to the Gartner Hype Cycle has expired
(link cannot be posted in the errata)

Note from the Author or Editor:
2023 Gartner Hype Cycle for data management link is at https://www.denodo.com/en/document/analyst-report/gartner-hype-cycle-data-management-2023

Arnould Monteyne  Sep 20, 2024 
Page IV. People, Processes, and Technology - 16. Technologies
Software Frameworks - Databricks - 6th paragraph

"Databricks made Data Lake open source in 2019..."
should be
"Databricks made >Delta< Lake open source in 2019..."

Note from the Author or Editor:
p. 239 of printed book.

Arnould Monteyne  Sep 26, 2024 
Printed
Page 5
under "Variety" section

"such as logs and CSV", should be "such as logs in CSV"

James Serra
 
Aug 09, 2024 
Printed
Page 21
second paragraph

change "is a transactional storage software layer that runs"
to
"is a transactional storage software layer called Delta Lake that runs"

James Serra
 
Mar 06, 2024  Mar 29, 2024
Page 33 Summary Chapter 1
4th paragraph of the Summary


ELT versus ELT instead of ETL versus ELT:

And in Chapter 9, you will read about data ingestion, with sections on ELT versus ELT, reverse ELT, batch versus real-time processing, and data governance.

Jaap Vermeer  Feb 20, 2024  Mar 29, 2024
Printed
Page 164
First paragraph

Add the following sentence as the second sentence in the first paragraph (so right before "This solves six problems...":

"The RDW is replaced with an optional relational serving layer, described later in this chapter".

James Serra
 
Mar 06, 2024  Mar 29, 2024
Page 184
Last bullet physical EDW

In a physical EDW, there is only place to control data, so you aren’t duplicating your efforts or your data.

Should be

In a physical EDW, there is only one place to control data, so you aren’t duplicating your efforts or your data.

Jaap Vermeer  Mar 01, 2024  Mar 29, 2024