Errata

Delta Lake: Up and Running

Errata for Delta Lake: Up and Running

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page Chapter 1. Figure 1-2. Sales dimensional model
In the diagram

The date dimension has a primary key of “DataKey”. I assume this should be “DateKey”.

Anonymous  Nov 01, 2023 
ePub Page Using Delta Lake with PySpark
second paragraph of "Using Delta Lake with PySpark"

I'm reading Delta Lake Up & Running and ran into an error when pip installing delta-spark in the "Using Delta Lake with PySpark" section in Chapter 2. Below are my steps to reproduce.

Was it the authors' intention to pip install from within the container or from a virtual environment from the host machine?

---

(default) ➜ ~ docker run -it apache/spark /bin/sh
$ pip install delta-spark
WARNING: The directory '/home/spark/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting delta-spark
Downloading delta_spark-3.0.0-py3-none-any.whl (21 kB)
Collecting pyspark<3.6.0,>=3.5.0
Downloading pyspark-3.5.0.tar.gz (316.9 MB)
|████████████████████████████████| 316.9 MB 17.0 MB/s
Collecting importlib-metadata>=1.0.0
Downloading importlib_metadata-7.0.0-py3-none-any.whl (23 kB)
Collecting py4j==0.10.9.7
Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
|████████████████████████████████| 200 kB 16.4 MB/s
Collecting zipp>=0.5
Downloading zipp-3.17.0-py3-none-any.whl (7.4 kB)
Building wheels for collected packages: pyspark
Building wheel for pyspark (setup.py) ... done
Created wheel for pyspark: filename=pyspark-3.5.0-py2.py3-none-any.whl size=317425365 sha256=8c412017d551bb56411f0fef58842e27b5f996ebe2d0f6bf0eb781657c5e2acd
Stored in directory: /tmp/pip-ephem-wheel-cache-8oqis16n/wheels/a6/ce/f9/17d82c92f044018df2fe30af63ac043447720d5b2cee39b40f
Successfully built pyspark
Installing collected packages: py4j, pyspark, zipp, importlib-metadata, delta-spark
ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/home/spark'
Check the permissions.

Anonymous  Dec 13, 2023 
PDF Page Page 50 Section Creating a Delta Table with SQL DDL
7th line in the section Creating a Delta Table with SQL DDL

It is not clear how one gets to the %sql prompt shown just above the CREATE TABLE statement. Upto this point in the book, one has been working either from the PySpark prompt >>> or the Scala prompt scala>

This sudden switch is confusing and could use some clarification.

Anonymous  Mar 07, 2024 
ePub Page 45
Footnote (1)

Link to Parquet Viewer sends to a page that does not exist (www.parquet-viewer.com)

Anonymous  Nov 09, 2023 
PDF Page 93
First in the Numbered list

The error occurs here:
"1. We are going to MERGE INTO the YellowTaxis Delta table. Notice that we give the
table an alias of source."

The alias should be "target" as it's mentioned the following source code:

"%sql
MERGE INTO taxidb.YellowTaxis AS target
USING YellowTaxiMergeData AS source
ON target.RideId = source.RideId
[...]"

July Mariana de Lima  Nov 25, 2023