Errata

Data Pipelines Pocket Reference

Errata for Data Pipelines Pocket Reference

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
1
https://learning.oreilly.com/library/view/data-pipelines-pocket/9781492087823/ch05.html#idm45664246170232

CREATE TABLE all_orders AS
SELECT OrderId,
OrderStatus,
LastUpdated,
ROW_NUMBER() OVER(PARTITION BY OrderId,
OrderStatus,
LastUpdated)
AS dup_count
FROM Orders;

TRUNCATE TABLE Orders;

-- only insert non-duplicated records
INSERT INTO Orders
SELECT * FROM all_orders
WHERE dup_count = 1;

DROP TABLE all_orders;

Better approach:
Create a CTE with
SELECT OrderId,
OrderStatus,
LastUpdated,
ROW_NUMBER() OVER(PARTITION BY OrderId,
OrderStatus,
LastUpdated)
AS dup_count
FROM Orders;
And then delete: dup_count>1

David Verdejo  Oct 22, 2020 
Printed Page 58
1st paragraph

The text below the code on p.58 references parameters (resume_stream, log_pos) that do not appear in the instantiated BinLogStreamReader in the code.

Anonymous  Nov 26, 2023 
Printed, Mobi Page 64
2rd paragraph of Full or Incremental Postgres Table Extraction

The Python library for extracting the data from Postgres should be "psycopg2" whereas the book says "pyscopg2", which is a typo. However, the import in the code is spelled correctly.

Kan Ouivirach  Aug 28, 2021