Errata

Errata for Cassandra: The Definitive Guide, Second Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	NA NA	As this is on Safari, I can't list the page number or location on the page. In chapter 4, "The Cassandra Query Language", section "Secondary Indexes", there is the following statement: "Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map." This may have been true for the version used when writing this book, but it's not the case for the most recent version of Cassandra, and it's not clear exactly how far back this was not the case. I have not personally verified this, but I cite the following from the #cassandra IRC channel: ------------------ davidmichaelkarr> David M. Karr Question about indexes. I found the following statement in "C:TDG": "Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map." 11:51:05 → nosukker joined (268c303a@gateway/web/cgi-irc/kiwiirc.com/ip.38.140.48.58) 11:51:41 <davidmichaelkarr> David M. Karr Is that saying you can't create a single index covering both the keys and the values, or is it saying that a particular map column cannot have both its keys and values indexed? 11:52:44 <thobbs> Tyler Hobbs davidmichaelkarr: I think it's saying both, but not all of that is necessarily true in recent C* versions 11:52:57 first, you can create an index on key-value pairs 11:53:09 and second, iirc, in 3.0+ you can index both keys and values separately 11:53:27 that might have actually been added in some later 3.x, but I think it's true in 3.0 11:53:36 <davidmichaelkarr> David M. Karr thobbs: Ok, so it's just not accurate for recent versions then. Ok. 11:53:59 <thobbs> Tyler Hobbs right ------------------ Note from the Author or Editor: The change to allow both keys and values to be indexed appears to have been added in the 2.2 release (compare https://docs.datastax.com/en/cql/3.3/cql/cql_using/useIndexColl.html with https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html) Please change the text: "Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map." to "Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values (which is the default), or both (in Cassandra 2.2 or later)."	David M. Karr	Sep 19, 2016	Apr 07, 2017
	NA NA	I don't even know if it makes sense to report this error here, but on Safari books, the "URL" button on each page is supposed to produce a URL that should return to the same page, assuming the user is logged into Safari. This works perfectly fine on all the Safari books I've read in the past. For some reason, the URLs that I get for pages in this book only return to the table of contents of the book. I just tested this again for two other books, and it still works fine. The URLs produced for this book just return to the TOC. Note from the Author or Editor: Confirmed that the links from individual pages in Safari Books Online do not work.	David M. Karr	Sep 22, 2016
Printed	Chapter 5, Calculating Size on Disk	Confirmed Errata for previous section ("Calculating Partition Size") says that number of hotels should not be considered in partition size. Subsequent section (Calculating Size on Disk) also needs to be corrected. Number of hotels should have no effect on partition size limit nor partition size calculation. Data for given hotel would be stored on different partition. Note from the Author or Editor: Reader is correct, since there is a partition for each hotel, number of hotels is not a factor, and this should flow through calculations in both sections. Nr=100rooms/hotel×730days=73,000rows and Partition size = 16 bytes + 0 bytes + 0.51 MB + 0.58 MB = 1.1 MB	Anonymous	Mar 30, 2017	Apr 07, 2017
Printed	Page 2 5th paragraph, first sentence	The sentence reads: In his 1970 paper "A Relational Model of Data for Large Shared Data Banks," Dr. Edgar F. Codd, also at advanced his theory of the relational model for data while working at IBM's San Jose research laboratory. The "at" before "advanced" appears to be a typo. Note from the Author or Editor: The phrase "also at" should be revised to "also at IBM,". In his 1970 paper "A Relational Model of Data for Large Shared Data Banks," Dr. Edgar F. Codd, also at IBM, advanced his theory of the relational model for data while working at IBM's San Jose research laboratory.	David Maldonado	Apr 23, 2017
PDF	Page 64 1st code block	The instructions below doesn't really update the row. The last name and the timestamp value remain unchanged. Cassandra silently ignores UPDATE instruction and on SELECT displays old value. cqlsh:my_keyspace> UPDATE user USING TIMESTAMP 1434373756626000 SET last_name = 'Boateng' WHERE first_name = 'Mary' ; cqlsh:my_keyspace> SELECT first_name, last_name, WRITETIME(last_name) FROM user WHERE first_name = 'Mary'; cqlsh> SHOW VERSION; [cqlsh 5.0.1 \| Cassandra 3.7 \| CQL spec 3.4.2 \| Native protocol v4] Note from the Author or Editor: The comment is correct, the issue is that using the timestamp provided in the book is by definition in the past compared to the time you are running the example. If you use the timestamp provided in the book, the update will be considered an earlier change than your write. Therefore the update is valid, and no error message is generated. The text that reads as follows: To do this, we’ll use the CQL UPDATE command for the first time, using the optional USING TIMESTAMP option: Should be modified to the following: To do this, we’ll use the CQL UPDATE command for the first time. We'll use the optional USING TIMESTAMP option to manually set a timestamp (note that the timestamp must be later than the one from our SELECT command, or the UPDATE will be ignored):	Ihor Mochurad	Aug 19, 2016	Apr 07, 2017
Printed	Page 66 tinyint definition	The tinyint Cassandra type is defined as tinyint: An 8-bit signed integer (as in Java) Since Java does not have a primitive type called tinyint, for consistency with the other type definitions it would be better to define it as tinyint: An 8-bit signed integer (equivalent to a Java byte) Note from the Author or Editor: Good recommendation. Please change as recommended to: tinyint: An 8-bit signed integer (equivalent to a Java byte)	Paolo Baronti	Mar 10, 2018
PDF	Page 70 2nd paragraph of inet block	The collapsed version of inet is printed as 2001:db8:85a3:a::8a2e:370:7334. But tested on Cassandra 3.10, it is displayed as 2001:db8:85a3::8a2e:370:7334 (with double colon in the middle). Wondering why is it not 2001:db8:85a3:::8a2e:370:7334 (with triple colon in the middle)? Note from the Author or Editor: There is an extra "a" in the encoded address that should be removed. The corrected text should read as follows: "... so the preceding value is rendered as follows when read using SELECT: 2001:db8:85a3:::8a2e:370:7334."	Antonius Sopian	Apr 24, 2017
PDF	Page 74 3	Now that we have defined our address type, we’ll try to use it in our user table, but if you’re using Cassandra 2.1 or earlier, you’ll run into a problem: cqlsh:my_keyspace> ALTER TABLE user ADD addresses map<text, address>; InvalidRequest: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, address>" I am facing this issue in version 3.7 version of Cassandra, when the book reads that the version must be <= 2.1 cqlsh:my_keyspace> SHOW VERSION; [cqlsh 5.0.1 \| Cassandra 3.7 \| CQL spec 3.4.2 \| Native protocol v4] Note from the Author or Editor: This problem does still exist in releases through 3.7, see https://issues.apache.org/jira/browse/CASSANDRA-7826. The reference to release 2.1 should be omitted, and the text modified to say: "Now that we have defined our address type, we’ll try to use it in our user table, but we immediately run into a problem"	Ihor Mochurad	Aug 19, 2016	Apr 07, 2017
PDF	Page 75	The "create" table is listed like so ``` CREATE TABLE my_keyspace.user ( first_name text PRIMARY KEY, addresses map<text, frozen<address>>, emails set<text>, id uuid, last_name text, login_sessions map<timeuuid, int>, phone_numbers list<text>, title text ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction. SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; ``` However, this throws two errors. - caching needs to be a map - map keys need to be single quoted The string that actually works is the following. Note the changes to "caching" ``` CREATE TABLE my_keyspace.user ( first_name text PRIMARY KEY, addresses map<text, frozen<address>>, emails set<text>, id uuid, last_name text, login_sessions map<timeuuid, int>, phone_numbers list<text>, title text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'} AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; ``` Note from the Author or Editor: Reader is correct. The line describing the caching settings should have single quotes rather than double quotes: AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}	Raju Gandhi	Feb 08, 2017	Apr 07, 2017
PDF	Page 76 1st code block	cqlsh:my_keyspace> SELECT * FROM user WHERE last_name = 'Nguyen'; InvalidRequest: code=2200 [Invalid query] message="No supported secondary index found for the non primary key columns restrictions" Book says that after attempting to fetch values by the last name, user will see error message as following: code=2200 [Invalid query] message="No supported secondary index found for the non primary key columns restrictions" In reality it slightly differs: InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" cqlsh:my_keyspace> SHOW VERSION; [cqlsh 5.0.1 \| Cassandra 3.7 \| CQL spec 3.4.2 \| Native protocol v4] Note from the Author or Editor: Since the book includes many output examples, it’s inevitable that many of the statement formats will undergo slight changes in future versions of Cassandra. The preface of the book should note more explicitly that examples were run against Cassandra 3.0. Specifically, the following section should be revised: Cassandra Versions Used in This Book This book was developed using the Cassandra 3.X series of releases, along with the DataStax Java Driver version 3.0. This statement should be revised to: This book was developed using Apache Cassandra 3.0 and the DataStax Java Driver version 3.0. The formatting and content of tool output, log files, configuration files, and error messages are as they appear in the 3.0 release, and may change in future releases.	Ihor Mochurad	Aug 19, 2016	Apr 07, 2017
PDF	Page 78 Summary paragraph	The link at the bottom of the page to the full language specification is broken. https://cassandra.apache.org/doc/cql3/CQL.html Not Found The requested URL /doc/cql3/CQL.html was not found on this server. Note from the Author or Editor: The Cassandra team has recently reorganized the documentation available at the Apache site (after publication of the book). The new URL for the CQL specification is: http://cassandra.apache.org/doc/latest/cql/index.html	Ihor Mochurad	Aug 19, 2016	Apr 07, 2017
Printed	Page 97 4th paragraph and following formula	Author received the following comment: "Do you think that calculation on the page 97, section, calculating the partition size is right? I believe, number of values is calculated for a partition rather than entire rows. In your calculation you are calculating for 5000 hotels, but each hotel is a partition (available_room_by_hotel_date). So I believe your assumption is wrong. Could you please clarify the calculation? I refer the datastax online video tutorial, it also says about rows per partition rather than either rows." The reader is correct, the number of hotels should not be included when calculating the number of rows per partition for this table, because each hotel will have its own partition. Therefore the text should read as follows: So the number of values for this table is equal to the number of rows. We still need to determine a number of rows. To do this, we make some estimates based on the application we’re designing. Our table is storing a record for each room, in each of our hotels, for every night. Let’s assume that our system will be used to store two years of inventory at a time, and there are 5,000 hotels in our system, with an average of 100 rooms in each hotel. This leads an estimated number of rows as follows: Nr = 100 rooms/hotel X 730 days = 73,000 rows This relatively small number of rows per partition is not going to get us in too much trouble, but if we start adding a lot of hotels or don’t manage the size of our inventory well using TTL, we could start having issues. We still might want to look at breaking up this large partition, which we’ll do shortly.	Jeffrey Carpenter	Aug 22, 2016	Apr 07, 2017
Printed	Page 97 1st paragraph	N_pk should be the number of partition key columns instead of the number of primary key columns. Otherwise, clustering keys are not considered in the calculation of partition sizes. Note from the Author or Editor: I spoke with Artem Chebotko who created these formulas originally and updated them to account for the storage format changes that came in Apache Cassandra 3.0 with the new storage engine implementation. Technically the description of the formula is correct, since the number of values is described as the number of cells, which is a specific reference to the storage format. Clustering column values are stored in a row header rather than as cells. This is a change from the pre-3.0 storage format, in which the clustering column values were stored as cells. While the description is technically correct, I see how it can be confusing and potentially not very useful to omit the clustering column values from a calculation of the number of values. To include those values as well, you would multiply the number of rows by the number of clustering columns.	Nick Triller	Aug 24, 2017
Printed	Page 137 1st sentence	The last part of the first sentence "The random partitioner ..." states this partitioner "... is Cassandra's default". This should be replaced by "was Cassandra's default in Cassandra 1.1 and earlier". Note from the Author or Editor: Agree with recommended change as described above.	Anonymous	Dec 28, 2016	Apr 07, 2017
Printed	Page 163 Second code sample	The second line of the coding sample contains an error. "MappingManager" is capitalized, which is incorrect, referencing the class "MappingManager" instead of the variable "mappingManager" declared in the previous line. The line should read: Mapper<Hotel> hotelMapper = mappingManager.mapper(Hotel.class);	Jeffrey Carpenter	Jun 10, 2017
Printed	Page 186 second full paragraph (the one after the first code sample)	The text regarding lightweight transactions reads: "This command checks to see if there is a record with the partition key, which for this table consists of the hotel_id." This should be clarified, it is more than the partition key that must be unique, it is the entire primary key. The lightweight transaction is trying to make sure the row does not exist. For a multi-row partition this distinction is important. The sentence should be changed to read: "This command checks to see if the row already exists, that is, if there is a record with the same primary key, which for this table consists of the hotel_id."	Jeffrey Carpenter	Jun 16, 2017
Printed	Page 292 middle	On page 292, there is a sentence midway down that reads "As with authentication, the authentication mechanism is pluggable". I believe it should read: "As with authentication, the authorization mechanism is pluggable". Note from the Author or Editor: The comment is correct. The sentence should read: "As with authentication, the authorization mechanism is pluggable".	Steve Halladay	Apr 28, 2017
PDF	Page 297 1st code block	It looks that we are attempting to create a trust store at node 1 and are adding to it a certificate generated by the node 1. Not sure, if that makes sense. Does node 1 need to establish secure connection with itself? $ keytool -import -v -trustcacerts -alias node1 -file node1.cer -keystore node1.truststore I would change it to: $ keytool -import -v -trustcacerts -alias node1 -file node1.cer -keystore nodeX.truststore where X, is the node, where we create a trust store for node 1 and import certificate produced by node 1 into newly created trust store. Note from the Author or Editor: This is a good clarification. Node 1 doesn't need to add its own public cert. I would replace the sentence "Each command looks something like the following:" with the sentence "For example, to add the certificate for node 1 to the keystore for node two, we would use the command: " And the command should be changed to: $ keytool -import -v -trustcacerts -alias node1 -file node1.cer -keystore node2.truststore	Ihor Mochurad	Sep 03, 2016	Apr 07, 2017
Mobi	Page 2282 3rd paragraph of section "Calculating Partition Size"	I'm reading MOBI, so there is no pages. In the 3rd paragraph of section "Calculating Partition Size" there is duplicate "of" in the phrase "and the number of of values per row" Note from the Author or Editor: Chapter 5, Pg 97, 1st Paragraph, remove repeated "of" as described above.	Alex Ott	May 15, 2018
Mobi	Page 3441 2nd paragraph of "Startup and JVM Settings"	The page number is approximate position in the MOBI file scripts are called conf/cassandra-env.sh & conf/cassandra-env.ps1 instead of conf/cassandra.env.sh & conf/cassandra.env.ps1 as in book.. Note from the Author or Editor: Chapter 7, Pg 144, second paragraph in "Startup and JVM Settings", second sentence should read: "The key file to look at is the environment script conf/cassandra-env.sh (or conf/cassandra-env.ps1 PowerShell script on Windows)." (please maintain italics on file names conf/cassandra-env.sh and conf/cassandra-env.ps1)	Alex Ott	May 15, 2018
Mobi	Page 5018 The "More on JMX" section	In this section we first talk about SNMP, but then mention SMTP instead of it in the sentence "which may be useful if you are using SMTP monitoring tools such as Nagios or Zenoss" Note from the Author or Editor: Chapter 10, Page 212, paragraph following the figure, remove the reference to SMTP so that the sentence reads: "The JVM also offers management capabilities via Simple Network Monitoring Protocol (SNMP), which may be useful if you are using monitoring tools such as Nagios or Zenoss."	Alex Ott	May 15, 2018
Mobi	Page 7088 "Production environment" item in the "Selecting instances" section	in the sentence about machines for production environment, it uses MB instead of GB when talking about memory size: "and anywhere from 16 MB to 64 MB of memory" Note from the Author or Editor: Chapter 14, "Selecting Instances", Pg 305, paragraph "Production environments" The sentence should read: "Cassandra nodes in production environments should have CPUs with at least eight cores (although four cores are acceptable for virtual machines), and anywhere from 16GB to 64GB memory."	Alex Ott	May 15, 2018

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Errata

Errata for Cassandra: The Definitive Guide, Second Edition