Errata

Kafka: The Definitive Guide

Errata for Kafka: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Chapter 4. Kafka Consumers: Reading Data from Kafka

In Chapter 4 --> Commits and Offsets --> Commit Current Offset , 2nd paragraph.

"By setting auto.commit.offset=false, offsets will only be committed when the application explicitly chooses to do so."

"auto.commit.offset" should be "enable.auto.commit" ?

Note from the Author or Editor:
(This is page 77 of the PDF).

The comment is correct: "auto.commit.offset" should be "enable.auto.commit"

vitojeng  Nov 25, 2017  Mar 30, 2018
ePub
Page Loc 193
Chapter 1, Enter Kafka, 1st paragraph

"distributing streaming platform" should be "distributed streaming platform"

Note from the Author or Editor:
Correct.

Larry McQueary  Dec 02, 2017  Mar 30, 2018
PDF
Page 13
Metrics and logging paragraph

Proper name "Elastisearch" is used. Missing the "c" and should be "Elasticsearch".

David Der  Nov 14, 2017  Mar 30, 2018
PDF
Page 21
last paragraph

The first sentence uses "configurations" when it should be / refers to "parameters" (see second sentence).

Quote of the paragraph:

"There are several broker configurations that should be reviewed when deploying Kafka for any environment other than a standalone broker on a single server. These parameters deal with the basic configuration of the broker, and most of them must be changed to run properly in a cluster with other brokers."

Note from the Author or Editor:
The wording has been changed to "configuration parameters"

Jeffrey 'jf' Lim  Dec 29, 2017  Mar 30, 2018
PDF
Page 23
First paragraph, final senstence

Note that the broker will place a new partition in the path that has the
least number of partitions currently stored in it, not the least amount of disk space
used in the following situations:

The final words "used in the following situations:" seem misplaced. Also, the sentence ends discussion of configuration option "log.dirs" so what are these situations?

Note from the Author or Editor:
The last sentence has been updated to be "Note that the broker will place a new partition in the path that has the least number of partitions currently stored in it, not the least amount of disk space used, so an even distribution of data across multiple directories is not guaranteed."

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
Printed
Page 23
1st and 3rd paragraph

log.dirs is in italics, but should be code format to be consistent with the rest of the config references.

Justin Pihony  Apr 10, 2018  Aug 09, 2019
PDF
Page 25
2nd bullet point

"What is the maximum throughput you expect to achieve when
consuming from a single partition? You will always have, at
most, one consumer reading from a partition, so if you know
that your slower consumer writes the data to a database and
this database never handles more than 50 MB per second from
each thread writing to it, then you know you are limited to
60MB throughput when consuming from a partition"

The throughput amount at the end of the section needs to be changed to 50MB from 60MB

Note from the Author or Editor:
The error report is correct. It should be "limited to 50MB throughput when consuming" to match the number earlier in the paragraph.

Anonymous  Oct 06, 2017  Mar 30, 2018
PDF
Page 25
3rd Paragraph

"You will always have, at most, one consumer reading from a partition" is this strictly true? Should it read "If you will always have..." or mention consumer groups?

Note from the Author or Editor:
This language does appear to be a little unclear, as when using a low-level consumer you could have two consumers reading the same partition (though the vast majority of application will not do this). I have updated the language to clarify that a single partition must be consumed in its entirety by a single consumer (which applies even when using consumer groups).

Dan Hanley  Nov 05, 2017  Mar 30, 2018
PDF
Page 25
Second item of bulleted list

...if you know that your slower consumer writes the data to a database and this database never handles more than 50 MB per second from each thread writing to it, then you know you are limited to 60MB throughput when consuming from a partition.

Why is it that the 50MB per second database limit turns into a 60MB maximum partition throughput?

Note from the Author or Editor:
This was a typo. Both numbers should be 50 MB/sec.

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 30
1st paragraph

"...handling X messages per second and a data rate of X megabits per second..." looks like placeholder values. Other errata mentions this also appearing in "section 710" of the kindle edition but I'm confirming here that it's in the pdf on page 30.

Note from the Author or Editor:
Thanks for the note. This has been resolved for the next update

oliver_meyn  Oct 18, 2017  Mar 30, 2018
PDF
Page 30
4th Paragraph

"Otherwise, ephemeral storage (such as the AWS Elastic Block Store) might be sufficient." AWS EBS is not ephemeral. Perhaps rewrite as "Otherwise, ephemeral storage or AWS EBS could be used depending on the level of replication available and resilience required."

Note from the Author or Editor:
The word that should have been used here is "remote". This will be resolved in the next update

Dan Hanley  Nov 05, 2017  Mar 30, 2018
PDF
Page 31
Last line

The word "more" is repeated twice.

Note from the Author or Editor:
Indeed.

"Chapter 6 contains more more information on replication of data"
Should be
"Chapter 6 contains more information on replication of data"

Martin Harrigan  Dec 06, 2017  Mar 30, 2018
Printed
Page 33
3rd paragraph

"It is preferable to reduce the size of the page cache rather than swap."

Shouldn't this be the reverse? The idea up to this point is to use more page cache than swap. Unless I am misreading this?

Note from the Author or Editor:
The text is unclear. The statement is trying to convey that it is better to reduce the amount of page cache available than to use any amount of swap.

Justin Pihony  Apr 07, 2018  Aug 09, 2019
Printed
Page 33
4th paragraph

=vm.dirty_background_ratio should not have the = at the beginning

Justin Pihony  Apr 07, 2018  Aug 09, 2019
Printed
Page 35
1st paragraph

(in which case the realtime option can be used)

realtime should be relatime

Justin Pihony  Apr 07, 2018  Aug 09, 2019
Printed
Page 44
code listing at page bottom

A variable "kafkaProps" is defined as a local variable with modifier "private". An access modifier is not allowed here. It should be removed.

A variable "producer" is initialized, but it is not definied. At P.57, "producer" is initialized with variable declaration. It can replace with:
KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);

Note from the Author or Editor:
1. That depends on context, but I agree it can be confusing in a snippet.

We need to replace:
"private Properties kafkaProps = new Properties();"
with:
"Properties kafkaProps = new Properties();"
on page 44.

2. I don't see a problem in page 57 example.
We create the producer with "Producer<String, Customer> producer = new KafkaProducer<String,
Customer>(props);" and the "props" instance is created in the first line of the example.

ueokande  Mar 21, 2018  Mar 30, 2018
Printed
Page 46
End of #3

InterruptException should be code formatted.

Note from the Author or Editor:
Indeed, in callout #3, toward the end we have "InterruptException" that should be in monospace type (like TimeoutException) but isn't.

Justin Pihony  Apr 08, 2018  Aug 09, 2019
PDF
Page 47
Bullet point #2

"returned a nonretriable exceptions" => should be exception instead of exceptions.

Note from the Author or Editor:
Actually, the broker doesn't return exceptions to producers - it returns errors.

So:
"returned a nonretriable exceptions"
Should be:
""returned a non-retriable error"

Martin Harrigan  Dec 06, 2017  Mar 30, 2018
Printed
Page 49
compression.type

"Gzip compression will typically use more CPU but result[s] in better compression ratios,..."

missing "s" as highlighted

Note from the Author or Editor:
Correct, "result" should be "results" in this sentence.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 51
max.request.size

"... the largest message you can send is 1MB or the producer can batch 1,000 messaged of size 1 K each into one request."

1. it should be 1024 instead of 1000
2. it should be "1KB each" (missing "K" and no space -- compare 1MB")

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 52
In Ordering Guarantees section

Setting the retries parameter to nonzero and the max.in.flights.requests.per.session to more than one means that it is possible that the broker will fail to write the first batch of messages, succeed to write the second (which was already in-flight), and then retry the first batch and succeed, thereby reversing the order.

But, the "max.in.flights.requests.per.session" does not exist in kafka documentation. I presume the authors meant "max.in.flight.requests.per.connection"

Note from the Author or Editor:
The report is correct. max.in.flights.requests.per.session should be max.in.flight.requests.per.connection.

Anonymous  Oct 17, 2017  Mar 30, 2018
Printed
Page 53
Both code examples

The source code indention is quite off.

First example "class Customer"
- first 6 line are indented to widely
- "this.customerID = ID" and "this.customerName = name" are indented too widely (relative to "public Customer(...) {")
- method "getName()"
-> "return customerName;" should be indented by two spaces (not just one) ?

Second example "class CustomSerializer"
- method "configure(...)"
-> "@Override" indented to widely
-> "// nothing to configure" should be indented by two spaces (not just one) ?

- method "serialize(...)"
-> "@Override" should be after JavaDoc comment ?
-> JavaDoc comment should be formatted differently, each line starting with "*" ?
-> JavaDoc comment has bad line wrapping; maybe make line shorted an break 'manually'
-> "try {" should be indented by two spaces (not just one) ?
-> "byte[] serializedName;" and "int stringSize;" are indented too widely

(continues on page 54)
- indention of "if (data.getName)" and corresponding "else" block
- within above mentioned "else" block, relative intention too wide
- "catch (Exception e)" indented too wide
- "throw new SerializationExcption(...)" has bad line wrapping; maybe break manually ?

- method "close()"
-> "// nothing to close" indented too widely

Note from the Author or Editor:
The indentation has been updated to a consistent 4 spaces

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
PDF
Page 54
Sample code

serializeName = data.getName().getBytes("UTF-8");

The variable name should be serializedName.

Note from the Author or Editor:
correct. That is a typo.

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 55
First json code

{"name": "name", "type": "string""},

The final "string"" is ended with two double quote symbols.

Note from the Author or Editor:
Correct. Should be only one quote at the end.

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 55
Last paragraph

In

"The reading application will contain calls to methods similar to getName(), getId(), and getFaxNumber."

the last method call should include parenthesis for consistency:
getFaxNumber()

Note from the Author or Editor:
Correct. Should be "getFaxNumber()".

Paolo Baronti  Dec 08, 2017  Mar 30, 2018
PDF
Page 57
Sample code

The purpose of the line
int wait = 500;
is unclear.

Note from the Author or Editor:
Leftover :)

The line:
int wait = 500;
Can be removed

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 57
Code sample

The statement

ProducerRecord<String, Customer> record = new ProducerRecord<>(topic, customer.getId(), customer);

uses "customer.getId()" as the key where "id" is defined to be an integer in the schema.
This in contrast to the type definition "ProducerRecord<String, Customer>" that requires a String key.

Using customer.getName() would align the code with the schema.

Note from the Author or Editor:
Correct. customer.getId() in the example should be replaced with customer.getName()

Paolo Baronti  Dec 10, 2017  Mar 30, 2018
Printed
Page 57
Code Example

It is not clear, how the AVRO serializer can handle class Customer or where Customer class come from. Customer is not a plain POJO but a special AVRO type class that is generated from the corresponding schema. This should be clarified.

Note from the Author or Editor:
Good point.

We need to add a numbered "callout" on the line that says:
Customer customer = CustomerGenerator.getNext();
in the code example.

The callout explanation should be:

Customer class is not a regular Java class (POJO) but rather it is a specialized Avro object, generated from a schema using Avro code generation. The Avro serializer can only serialize Avro objects, not POJO. Generating Avro classes can be done either using the avro-tools jar or the Avro Maven Plugin, both part of Apache Avro. See "Apache Avro™ Getting Started (Java)" guide (http://avro.apache.org/docs/current/gettingstartedjava.html) for details on how the to generate Avro classes.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 57
Last sentence on page

Note that the AvroSerializer can also...

This should be KafkaAvroSerializer

Note from the Author or Editor:
Thats right. In callout #1, "AvroSerializer" should be "KafkaAvroSerializer".

Justin Pihony  Apr 09, 2018  Aug 09, 2019
PDF
Page 58
code sample

String email = "example " + nCustomers + "@example.com"

Missing final semicolon:
String email = "example " + nCustomers + "@example.com";

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 58
code sample

customer.put("id", nCustomer);

Should be
customer.put("id", nCustomers);
since that is how the variable is named in the rest of the code sample.

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
Printed
Page 58
Code Example

The code for "String schemaString" variable has bad line wrappings and is hard to read. Maybe start assignment one line after the variable and use shorter indention?

String schemaString =
"{\" namespace\" .... " +
".....";

-----

"ProducerRecord data =
new ProducerRecord(...)"

has wide indention and thus bad line wrapping

Note from the Author or Editor:
The indentation and line wrapping has been fixed

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
PDF
Page 59
2nd last paragraph

The text says: "Here, the key will simply be set to null, which may indicate that a customer name was missing on a form. " but the example does not use customers names. Perhaps changing to "Here, the key will simply be set to null." would be clearer?

Note from the Author or Editor:
I can see why "Laboratory Equipment" doesn't really look like a customer name :)

Lets change:
"Here, the key will simply be set to null, which may indicate that a customer name was missing on a form. "
to
"Here, the key will simply be set to null" as suggested by Dan. Missing keys can indicate all kinds of things anyway.

Dan Hanley  Nov 08, 2017  Mar 30, 2018
PDF
Page 59
Both code examples

The first ProducerRecord<Integer, String> should be ProducerRecord<String, String> to match the arguments.

The second ProducerRecord<Integer, String> should be ProducerRecord<String> to match the arguments.

Note from the Author or Editor:
It will be ProducerRecord<String, String> in both examples, both type arguments are required even when the key is null.

Martin Harrigan  Dec 08, 2017  Mar 30, 2018
Printed
Page 59
#5

"...that countains..." should be contains without the u

Justin Pihony  Apr 09, 2018  Aug 09, 2019
PDF
Page 60
Last paragraph

"...resulting in one partition being about twice as large as the rest. " Hard to see how twice as large is derived, Perhaps just say "much larger".

Note from the Author or Editor:
Good point.

Lets change:
"...resulting in one partition being about twice as large as the rest. "
to
"...resulting in one partition being much larger than the rest."

Dan Hanley  Nov 08, 2017  Mar 30, 2018
Printed
Page 61
Code Example

Code indention bad.

- should use 2-space indention to align with other examples
- some lines are too long and wrap badly

Note from the Author or Editor:
The indentation has been consistently set to 4 spaces.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 61
Code Example

The return value for the "Banana" customer should be "numPartitions - 1" and not "numPartitions" because partitions are number starting at zero.

Note from the Author or Editor:
Good catch!

In the code example, where we say:
return numPartitions;

It should be:
return numPartitions - 1;

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
PDF
Page 64
2nd Paragraph

"from all four t1 partitions. " t1 should be capitalised to T1 for consistency.

Note from the Author or Editor:
Correct, "t1" should be "T1".

Dan Hanley  Nov 10, 2017  Mar 30, 2018
PDF
Page 64
Captions on Figures 4-2, 4-3, 4-4 and 4-5

Figure 4-2: There is one consumer group with two consumers; not two consumer groups.

Figure 4-3: There is one consumer group with four consumers; not four consumer groups.

Figure 4-4: There are not more consumer groups than partitions; there are more consumers than partitions. Also, I don't think this means missed messages just idle consumers.

Figure 4-5: The reference to missing messages is confusing. Both consumer groups will get all messages.

Note from the Author or Editor:
The first 3 figure captions have been corrected to properly refer to a number of consumers in a single group. The last two figure captions have been corrected to not refer to missing messages.

Martin Harrigan  Dec 08, 2017  Mar 30, 2018
PDF
Page 66
Figure 4-5

Both Consumer Groups are labelled as "Consumer Group 1"

Note from the Author or Editor:
Indeed.

Should be, "Consumer Group 1" on top and "Consumer Group 2" on the bottom.

Dan Hanley  Nov 10, 2017  Mar 30, 2018
PDF
Page 68
Sidenote: How Does the Process of Assigning Partitions to Brokers Work?

After deciding on the partition assignment, the consumer leader sends the list of assignments to the GroupCoordinator...

I guess "consumer leader" means group leader (this is how it's been called elsewhere).
Also GroupCoordinator appears in a courier font, as if it were a class name.

Note from the Author or Editor:
Yes. It would be clearer if "consumer leader" was replaced by "consumer group leader".

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
Printed
Page 69
Code Example

Bad line wrapping. Maybe break line manually

"KafkaConsumer<...> consumer = new KafkaConsumer<...>(props);"

---

Typo: Last sentence in paragraph after code example:

"...which is the name of the consumer group this consumer belong[s] to."

missing "s" as highlighted

Note from the Author or Editor:
Thanks for catching.

1. The last two lines of the code example should be changed to:
KafkaConsumer<String, String> consumer =
new KafkaConsumer<String, String> (props);

2. The last sentence in the paragraph after code example should say "belongs" as Matthias corrected above.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 70
1st Code Example

The argument to subscribe is not a String but a Pattern. Thus

consumer.subscribe("test.*");

is wrong and should be

consumer.subscribe(Pattern.compile("test.*"));

Note from the Author or Editor:
Correct.

At the very top of the page:
consumer.subscribe("test.*");
should be replaced with:
consumer.subscribe(Pattern.compile("test.*"));

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 70
Code Example (Poll Loop)

Code formatting inconsistent with other example:
- no newline for "for-loop"; curly brace should be in same line

Why is "partition" in the "log.debug" statement printed as String (ie, "%s") instead of a number like the offset (ie, using "%d")

Note from the Author or Editor:
Regarding the "for-loop", for better or worse, the code style isn't 100% consistent between all examples in the book. Fixing this one example won't make a huge difference, so let's leave it as is.

The second comment is correct. The line
"log.debug("topic = %s, partition = %s, offset = %d,"
Should have been:
log.debug("topic = %s, partition = %d, offset = %d,"

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 70
code example for the poll loop

`custCountryMap` counts number of records with the record value as a key. Is `custCountryMap` a `Map<String, Integer>`?

The condition of the if-block in the code:

if (custCountryMap.countainsValue(record.value())) {
updatedCount = custCountryMap.get(record.value()) + 1;
}

Should it be

if (custCountryMap.containsKey(record.value())) {
updatedCount = custCountryMap.get(record.value()) + 1;
}

?

Additionally, that can be simplified with `getOrDefault()` method:

int updatedCount = custCountryMap.getOrDefault(record.value(), 1);

Note from the Author or Editor:
Good catch. This is correct.

Lets replace:
if (custCountryMap.countainsValue(record.value())) {

With
if (custCountryMap.countainsKey(record.value())) {

Shin'ya Ueoka  Apr 28, 2018  Aug 09, 2019
Printed
Page 71
Bullet point #3

Bullet point #3 explains what poll() returns. Later it talks about the timeout parameter passed into poll() itself -- this was already covered by bullet point #2 and does not belong to #3.

Note from the Author or Editor:
Correct.
The sentence:
"The poll() method takes a timeout parameter. This
specifies how long it will take poll to return, with or without data. The value is
typically driven by application needs for quick responses—how fast do you want to return control to the thread that does the polling?"

Should be removed from code-comment #3.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
PDF
Page 72
fetch.min.bytes property description

If a broker receives a request for records from a consumer but the new records amount to fewer bytes than min.fetch.bytes, the broker will wait until ....

Should be fetch.min.bytes instead of min.fetch.bytes

Note from the Author or Editor:
Correct.

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 72
Third paragraph

Spelling: recieve => receive

Note from the Author or Editor:
Correct. "recieve" should be "receive".

Martin Harrigan  Dec 13, 2017  Mar 30, 2018
PDF
Page 73
Second paragraph

Typo: heatbeat.interval.ms => heartbeat.interval.ms

Note from the Author or Editor:
Correct.

Martin Harrigan  Dec 13, 2017  Mar 30, 2018
Printed
Page 73
1st paragraph

...(determined by max.message.size...

should be max.message.bytes

Note from the Author or Editor:
`max.message.size` should be `message.max.bytes` (since I refer to the broker config here).

Justin Pihony  Apr 10, 2018  Aug 09, 2019
Printed
Page 73
1st paragraph

max.message.bytes is the topic level max, so the largest message should come from message.max.bytes if you are referring to broker.

However it might be worth mentioning both topic and broker configs here?

Note from the Author or Editor:
Lets replace `max.message.size` with `message.max.bytes`.
This doesn't look like the right place to get into topic configuration.

Justin Pihony  Apr 10, 2018  Aug 09, 2019
Printed
Page 73
2nd paragraph

The default for session.timeout.ms is 10s, not 3 - 3 is for heartbeat.interval.ms

Note from the Author or Editor:
Good catch. Should be:
"The amount of time a consumer can be out of contact with the brokers while still considered alive defaults to 10 seconds."

Justin Pihony  Apr 10, 2018  Aug 09, 2019
PDF
Page 74
enable.auto.commit property

We discussed the different options for committing offsets earlier in this chapter.

Commit options are discussed in the next section so it should probably be

We'll discuss the different options for committing offsets later in this chapter

Note from the Author or Editor:
This is what happens when you decide to move text around at the last minute :)

The sentence "We discussed the different options for committing offsets earlier in this chapter." should be:
"We'll discuss the different options for committing offsets later in this chapter"

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
Printed
Page 74
auto.offset.rest

Should auto.offset.reset cover the none option?

Note from the Author or Editor:
Good point.
Lets add:
"Setting `auto.offset.reset` to `none` will cause an exception to be thrown when attempting to consume from invalid offset." after
"the consumer will read all the data in the partition, starting from the very beginning."

Justin Pihony  Apr 10, 2018  Aug 09, 2019
Printed
Page 76
2nd paragraph

The paragraph leading to Automatic Commit ends with a colon. While this seems to be leading to a list of sorts, the previous structure around this hand off was to end with a period as a normal paragraph

Justin Pihony  Apr 10, 2018  Aug 09, 2019
PDF
Page 77
2nd last paragraph

"When rebalance is triggered" for consistency of style could be reworded as "When a rebalance is triggered"

Note from the Author or Editor:
Agree. Should be "a rebalance".

Dan Hanley  Nov 14, 2017  Mar 30, 2018
Printed
Page 77
Code example

Example code uses `printf(...)` with `%s` for partition number instead of `%d`.

Indention is messed up a little bit, too.

Note from the Author or Editor:
Correct.
In the example:
"partition = %s"
Should be
"partition = %d"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 79
Code snippet

public void onComplete(Map<TopicPartition,
OffsetAndMetadata> offsets, Exception exception) {
if (e != null)
log.error("Commit failed for offsets {}", offsets, e);
}

Exception is called exception in the method signature but e in method body.

Note from the Author or Editor:
This is indeed a mistake.

Should be:
"OffsetAndMetadata> offsets, Exception e) {"

Dan Hanley  Nov 14, 2017  Mar 30, 2018
Printed
Page 81
#3

If I go to the Kafka docs there is a bold note:

Note: The committed offset should always be the offset of the next message that your application will read.

This should be explicitly explained in this bullet.

Note from the Author or Editor:
Good point. Lets add:
"The committed offset should always be the offset of the next message that your application will read."
After:
"After reading each record, we update the offsets map with the offset of the next message we expect to process."

Justin Pihony  Apr 11, 2018  Aug 09, 2019
Printed
Page 81
Commit Specific Offset

Quote:

"has offset 5000, you can call commitSync() to commit offset 5000"

If offset 5000 was consumed, one should commit offset 5001.

Note from the Author or Editor:
Correct.

Lets change:
"If you are in the middle of processing a batch of records, and the last message you got from partition 3 in topic “customers” has offset 5000, you can call commitSync() to commit offset 5000 for partition 3 in topic “customers.”"
To:
"If you are in the middle of processing a batch of records, and the last message you got from partition 3 in topic “customers” has offset 5000, you can call commitSync() to commit offset 5001 for partition 3 in topic “customers.”"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 82
2nd paragraph

"(e.g., the currentRecords map we used when explaining pause() functionality"

At this point in the book pause functionality has not yet been discussed. iirc it is not introduced until chapter six. Is this e.g. required here? if so should it be better signposted?

Note from the Author or Editor:
Good point. There was a "pause" explanation, which we removed (new versions of the consumer made "pause" less useful and we decided to leave it out).

Lets remove this entire sentence:
"If your consumer maintained a buffer with events that it only processes occasionally (e.g., the currentRecords map we used when explaining pause() functionality), you will want to process the events you accumulated before losing ownership of the partition"

Dan Hanley  Nov 14, 2017  Mar 30, 2018
Printed
Page 83
Code example

`new OffsetAndMetadata(record.offset()+1, "no metadata")`

replace "no metadata" with `null`

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 84
2nd paragraph

seekToBeginning(TopicPartition tp) and seekToEnd(TopicPartition tp)

Should this be
seekToBeginning(Collection<TopicPartition> tp) and seekToEnd(Collection<TopicPartition> tp)?

Note from the Author or Editor:
Correct.

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 84
Code sample

currentOffsets.put(new TopicPartition(record.topic(), record.partition()), record.offset());

Should it be

currentOffsets.put(new TopicPartition(record.topic(), record.partition()),
new OffsetAndMetadata(record.offset()+1) ;

?

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 85
2nd paragraph

>> Now the only problem is if the RECORD is stored in a database and not in kafka...

should be changed to

>> Now the only problem is if the OFFSET is stored in a database and not in kafka...

The error is indicated in caps

Note from the Author or Editor:
This comment is correct. Instead of "Now the only problem is if the record is stored in a database and not in kafka" page 85 should say "Now the only problem is if the offset is stored in a database and not in kafka".

Ashok Chilakapati  Feb 14, 2018  Mar 30, 2018
PDF
Page 86
Last paragraph

At the end of the first sentence on the last paragraph, the method consumer.wakeup(), is split across two lines into "con" and "sumer.wakeup()", which looks like a formatting issue.

Note from the Author or Editor:
True. The sentence "When you decide to exit the poll loop, you will need another thread to call con
sumer.wakeup()."
Should have a line-break before "consumer" so it will be:
"When you decide to exit the poll loop, you will need another thread to call
consumer.wakeup()."

Ray Chiang  Feb 03, 2018  Mar 30, 2018
Printed
Page 89
Second Code Example

1) Should the check `data.length` check for 16? 8 bytes for `id` and 8 bytes for `nameSize` ?

2) Exception message does not make sense:

"Size of data received by IntegerDeserializer is shorter than expected."

There is no IntegerDeserializer, but a CustomDeserializer is implemented.

3) is the check for `null` and `data.length` required/intended? It's all wrapped with a try-catch-block anyway.

Note from the Author or Editor:
1. Good point.
"if (data.length < 8)"
should be
"if (data.length < 16)"

And "IntegerDeserializer" should be "deserializer".

2. The code example here could definitely be better. But this is outside the scope of errata and this is an example of what one shouldn't do.

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 90
Sample code at page top

byte[] nameBytes = new Array[Byte](nameSize);

Should probably be

byte[] nameBytes = new byte[nameSize];

Note from the Author or Editor:
On page 90, in the code sample, the line

byte[] nameBytes = new Array[Byte](namesize);

should actually be

byte[] nameBytes = new byte[nameSize];

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 90
Sample code at page bottom

consumer.subscribe("customerCountries")

Should it be

consumer.subscribe(Collections.singletonList("customerCountries"))
?

Paolo Baronti  Nov 27, 2017  Mar 30, 2018
PDF
Page 90
code listing at page bottom

props.put("value.deserializer", "org.apache.kafka.common.serialization.CustomerDeserializer");

Package org.apache.kafka.common.serialization should not be used for custom code.
A simpler solution might be
props.put("value.deserializer", CustomerDeserializer.class.getName());

Note from the Author or Editor:
Good catch.

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 90
Code listing at page top

String nameSize = buffer.getInt();

Variable nameSize was already defined at the top of the method. Besides it should have type int (not String). The line should be replaced with
nameSize = buffer.getInt();

Note from the Author or Editor:
Corrected "String nameSize = buffer.getInt();" to "nameSize = buffer.getInt();"

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 90
Code listing at page top

name = new String(nameBytes, 'UTF-8');

Java strings are delimited by double quotes. It should be
name = new String(nameBytes, "UTF-8");

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
Printed
Page 90
Sample code at the top of the page (the catch)

The "throw new SerializationException" text is:

"Error when serializing Customer to byte[] "

but this is a deserializer example and so it should be reversed, since you're converting from bytes to a Customer object.

There is also a switching of deserializer/serializer in the line before the second set of example code):

"The consumer code that uses this serializer will look similar to this example:"

Sorry if this is overly pedantic.

Note from the Author or Editor:
Good catch.

Lets replace:
SerializationException("Error when serializing " + "Customer to byte[] " + e);
With
SerializationException("Error when deserializing " + "byte[] to Customer " + e);

Blake Ulmer  Apr 23, 2018  Aug 09, 2019
PDF
Page 91
first line

record.value().getId()

Should be

record.value().getID()

since this is how the method was defined for class Customer on page 89

Note from the Author or Editor:
The typo "record.value().getId()" has been fixed to "record.value().getID()"

Paolo Baronti  Nov 28, 2017  Mar 30, 2018
PDF
Page 91
Sample code at page top

The for loop straddling across page 90 and page 91 should be followed (on page 91) by a commit statement like

consumer.commitSync();

Note from the Author or Editor:
Correct.

At the very top of page 91 there are 4 lines that read:
record.value().getId() + " and
current customer name: " + record.value().getName());
}
}

It should read:
record.value().getId() + " and
current customer name: " + record.value().getName());
}
consumer.commitSync();
}

(Note that the new line should be indented to align with the curly bracket just above it).

Paolo Baronti  Dec 10, 2017  Mar 30, 2018
PDF
Page 91
Sample code at mid page

The property keys "key.serializer" and "value.serializer" should be

"key.deserializer" and "value.deserializer"


Note from the Author or Editor:
Correct.
In this example
"key.serializer" should be "key.deserializer"
and
"value.serializer" should be "value.deserializer"

Paolo Baronti  Dec 10, 2017  Mar 30, 2018
PDF
Page 91
Sample code at mid page

The statement

KafkaConsumer consumer = new KafkaConsumer(createConsumerConfig(brokers, groupId, rl));

uses function createConsumerConfig() to return a set of properties to be passed to the KafkaConsumer constructor.
The function is not actually defined in the code sample but a suitably defined Property object is defined and populated in the lines above the statement.
Also generic types are not used for KafkaConsumer.

Wouldn't
KafkaConsumer<String, Customer> consumer = new KafkaConsumer<>(props);
be better?

Note from the Author or Editor:
Absolutely.

We need to replace:
KafkaConsumer consumer = new
KafkaConsumer(createConsumerConfig(brokers, groupId, url));

With:
KafkaConsumer<String, Customer> consumer =
new KafkaConsumer<>(props);

Paolo Baronti  Dec 10, 2017  Mar 30, 2018
Printed
Page 91
code sampe

As others have pointed out the props keys for the deserializers are wrong (in the example, the props for the serializers are uses). Furthermore, to get it working, I had to specify "specific.avro.reader" to true otherwise I get a ClassCastException from GenericAvroRecord to my generated class. I believe that property should be always set to true when deserializing avro messages?

thank you for the fantastic work on this book. It's great!

Note from the Author or Editor:
I think this comment is for an older print of the book. Note that the errors pointed out here are for page 90 (not 91) in the PDF.

The deseralizers are correct in the recent version, but the reader is correct that a property is missing.

Lets add:
props.put("specific.avro.reader","true");
In the line after:
props.put("value.deserializer",
"io.confluent.kafka.serializers.KafkaAvroDeserializer");
At the code example on the end of page 90

Luca Pette  Mar 28, 2018  Aug 09, 2019
PDF
Page 95
First paragraph under "Cluster Management"

The last sentence of the paragraph ends with "so they get notified when brokers are added or removed". I think it should be "so that they get notified when brokers are added or removed".

Note from the Author or Editor:
A word was indeed accidentally left out. The sentence "so they get notified when brokers are added or removed" should be "so that they get notified when brokers are added or removed" because the following clause describes the intention of the verb "register".

Ray Chiang  Feb 03, 2018  Mar 30, 2018
Printed
Page 97
Paragraph "Replication"

Second sentence:

"The first sentence in Kafka's documentation..."

The docs might change at any point. Not sure if this should be rephrased?

Note from the Author or Editor:
Yes, this didn't stand the test of time well :)

Lets go with:
Replication is at the heart of Kafka’s architecture. Indeed, Kafka is often described as “a distributed, partitioned, replicated commit log service.”

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 100
Figure 5-1

The ovals in the bottom right of the figure should be labeled IOThread (not Processor Thread).

Note from the Author or Editor:
In chapter 5, figure 5-1, the top left oval should be labeled "Network Thread". The bottom right oval should be labeled "IO Thread"

Paolo Baronti  Nov 29, 2017  Mar 30, 2018
PDF
Page 104
3rd paragraph

In
"Now when an application calls the commitOffset() client API,"

the commitOffset() seems to be a function call. It appears nowhere else in the book.
Is it intended to mean any of commitSync() and commitAsync()?

Note from the Author or Editor:
Yes, I meant something more generic rather than a specific method.

Lets change: "Now when an application calls the commitOffset() client API,"
to:
"Now when an applications calls the client API to commit consumer offsets,"

Paolo Baronti  Nov 29, 2017  Mar 30, 2018
PDF
Page 110
4th paragraph

Previous misspelled as pervious

Dan Hanley  Nov 28, 2017  Mar 30, 2018
PDF
Page 120
2nd bullet point in 3rd paragraph

> Now replica 3 is unavailable and replica 0 is back online. Replica 0 only has messages 0-100 but not 100-20

In this section, assume that there are three replicas of a partition. In this paragraph, about unclean leader election is discussed. So the unavailable replica must be replica 2 but not replica 3.

Note from the Author or Editor:
The description in the error report is correct. It should be "Now replica 2 is unavailable" instead of "replica 3".

Shin'ya Ueoka  Oct 22, 2017  Mar 30, 2018
PDF
Page 123
2nd paragraph

"In conjunction with the min.insync.replica configuration..."

The configuration parameter should be min.insync.replicas. The final 's' is missing

Note from the Author or Editor:
Correct, this should be "min.insync.replicas"

Paolo Baronti  Nov 29, 2017  Mar 30, 2018
PDF
Page 123
4th paragraph

In
"if the broker returns the error code LEADER_NOT_AVAILABLE, the producer can try sending the error again"

the producer can try sending the "message" again (not the error).

Note from the Author or Editor:
Correct.
"if the broker returns the error code LEADER_NOT_AVAILABLE, the producer can try sending the error again"
Should be:
"if the broker returns the error code LEADER_NOT_AVAILABLE, the producer can try sending the message again"

Paolo Baronti  Nov 29, 2017  Mar 30, 2018
PDF
Page 129
3rd paragraph

In
"When writing results to a system like a relational database or Elastic search"

Elastic search should be written as a single word: Elasticsearch

Paolo Baronti  Nov 29, 2017  Mar 30, 2018
Printed
Page 142
Bullet points in "Running Connect"

Each bullet point has two colons instead of one.

Note from the Author or Editor:
Right.

Should be:
"bootstrap.servers:" and "group.id:"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 143
2nd paragraph

The paragraph references configuration properties
key.converter.schema.enable
and
value.converter.schema.enable

Should it be

key.converter.schemas.enable
and
value.converter.schemas.enable

?

Note from the Author or Editor:
Yes. Should be:
key.converter.schemas.enable
and
value.converter.schemas.enable

Paolo Baronti  Dec 04, 2017  Mar 30, 2018
Printed
Page 143
Second paragraph

Quote:

"rest.host.name and rest.port Connectors are typically..."

Should this be a bullet point? (note, missing colon)

" - rest.host.name and rest.port: Connectors are typically..."

Note from the Author or Editor:
Oh yeah, this page has some formatting issue:

Paragraph 2 is part of the first bullet point.
Paragraph 3 is its own bullet point.

"key.converter and value.converter::" should be "key.converter and value.converter:"
and
"rest.host.name and rest.port" should be "rest.host.name and rest.port:"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 143
Code examples

Prompt shows username (two times):

gwen$ curl ...

I think user name should be omitted.

Meta comment: examples throughout the whole book are not consistent but use different prompt sign (either $ or #) or there is no prompt sign at all.

Note from the Author or Editor:
Good point.

Lets change the two occurrences of "gwen$" to "$"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 144
Connector Example

The following code does not work :

echo '{"name":"load-kafka-config", "config":{"connector.class":"FileStream-
Source" ...

instead the connector.class should be FileStreamSource :

echo '{"name":"load-kafka-config", "config":{"connector.class":"FileStream
Source" ...

My 2 cents ...

Note from the Author or Editor:
Correct. This error repeats twice. In the code block starting with "echo" and the code block immediately below. I believe this is the result of breaking a single word "FileStreamSource" into two lines. Since the result ("FileStream-Source") indeed doesn't work, I suggest breaking the line after the preceding column (i.e. after "connector.class":) in those two code blocks.

harel  Nov 05, 2017  Mar 30, 2018
PDF
Page 144
3rd shell command

In command

bin/kafka-console-consumer.sh --new --bootstrap-server=localhost:9092 --topic kafka-config-topic --from-beginning

option --new is invalid. It should probably be --new-consumer but since version 0.10.1 it is no longer needed.

Note from the Author or Editor:
Corrected to --new-consumer in both places in the chapter.

Paolo Baronti  Dec 05, 2017  Mar 30, 2018
Printed
Page 144
Code examples

Command output should be removed.

Examples show output from the executed commands. This is inconsistent with other examples. Also, it's confusing because it is not obvious that the command output is shown. The output is also not relevant, because it's not discussed in the text.

Note from the Author or Editor:
We follow the practice of printing the command output throughout the connect chapter to allow the readers to check that they are on the right track. This chapter is more "tutorial style" than others, so I think this helps.

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 146
First paragraph "MySQL to Elasticsearch"

"...and index it's contents."

Plural indented or should it be "content" ?

Note from the Author or Editor:
Right.

"...and index it's contents."
should be
"...and index its content."

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 147
First Code Example

Why is commit necessary?

'mysql> commit;'

No transaction was started explicitly, thus, each statement should be its own transaction. Also, the output is

"Query OK, 0 rows affected (0.01 sec)"

This indicates, that there was nothing to do?

Or does MySQL behave differently than I expect?

Minor:

Missing empty line between commands:

mysql> use test;
Database changed
mysql> create table login ...
Query OK,....

Note from the Author or Editor:
This 2 lines can be removed:

mysql> commit;
Query OK, 0 rows affected (0.01 sec)

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 148
Last paragraph

"It took multiple attempts to get the connection string right."

What does this mean?

Note from the Author or Editor:
I think this was a copy-paste error.

We should remove the sentence "It took multiple attempts to get the connection string right."

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 149
Second Code Example

Command output should be removed.

Examples show output from the executed commands. This is inconsistent with other examples. Also, it's confusing because it is not obvious that the command output is shown. The output is also not relevant, because it's not discussed in the text.

Note from the Author or Editor:
Yeah, this makes sense.

Lets remove this:

"
<more stuff>
{"schema":{"type":"struct","fields":
[{"type":"string","optional":true,"field":"username"},
{"type":"int64","optional":true,"name":"org.apache.kafka.connect.data.Timestamp","
version":1,"field":"login_time"}],"optional":false,"name":"login"},"payload":{"
username":"gwenshap","login_time":1476423962000}}
{"schema":{"type":"struct","fields":
[{"type":"string","optional":true,"field":"username"},
{"type":"int64","optional":true,"name":"org.apache.kafka.connect.data.Timestamp","
version":1,"field":"login_time"}],"optional":false,"name":"login"},"payload":{"
username":"tpalino","login_time":1476423981000}}
"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 149
Last paragraph

"Because the events in Kafka lack keys, we need to tell the Elasticsearch connector to use the topic name, partition ID, and offsets as the key for each event."

It's unclear how this is done? What config does this refer to?

Note from the Author or Editor:
You are right, this is unclear. Lets add:
"This is done by setting `key.ignore` configuration to `true`"
After:
"Because the events in Kafka lack keys, we need to tell the Elasticsearch connector to use the topic name, partition ID, and offset as the key for each event. "

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 152
1st paragraph

In
"choosing the lower of max.tasks configuration and the number of tables"

Should the configuration property be "tasks.max" instead?

Paolo Baronti  Dec 05, 2017  Mar 30, 2018
Printed
Page 152
Paragraph "Tasks"

"After tasks are initialized, the[y] are started..."

Missing 'y'

Note from the Author or Editor:
Good catch.

"After tasks are initialized, the are started..." should be "After tasks are initialized, they are started..."

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
Printed
Page 153
3rd Paragraph

"...it can take a few month to get [it] right"

Missing word.

Note from the Author or Editor:
Thanks.

"...it can take a few month to get right" should be "...it can take a few month to get everything right"

Matthias J Sax  Oct 21, 2018  Aug 09, 2019
PDF
Page 163
4th paragraph

In the paragraph logical topic "user" is defined to be composed of "SF.users" and "NYC.users" but is later referred as ".users" (with a leading period) in
"Consumers will need to consume events from .users if they wish to consume..."

Note from the Author or Editor:
True! I wanted to refer to subscribing to regular expression to consume from both topics.

The sentence: "Consumers will need to consume events from .users if they wish to consume all user events."
should be:
"Consumers will need to consume events from *.users if they wish to consume all user events."

Note that asterix! I think it got chopped by the PDF generator.

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
PDF
Page 166
Figure 8-5

The labels on the topics/partitions are all the same: "Topic A, Partition 0".
I think they should be (from top to bottom):
"Topic A, Partition 0"
"Topic B, Partition 0"
"Topic __conusmer_offsets"

Note from the Author or Editor:
The labels should be, from top to bottom:

Topic A, Partition 0
Topic B, Partition 0
Topic __consumer_offsets, Partition 0

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 166
Figure

Both clusters are labeled as "Production Kafka Cluster" while I think only the left one should be labeled like this, but the right one should be "Failover Kafka Cluster" or similar.

Note from the Author or Editor:
Good catch.

In diagram 8-5, the left box should be "Production Kafka Cluster" and the right one should be "Failover Kafka Cluster".

Matthias J Sax  Nov 09, 2018  Aug 09, 2019
PDF
Page 169
3rd paragraph

The text

"This involves using rack definitions to make sure each partition has replicas in multiple datacenters and the use of min.isr and acks=all to ensure that every write is acknowledged from at least two datacenters."

mentions configuration property "min.isr".
The book never describes such property.
Is this a shorthand for "min.insync.replicas"?

Note from the Author or Editor:
Correct. "min.isr" should be "min.insync.replicas".

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 173
1st and 2ed paragraph

Why run MirrorMaker in source/destination cluster?

1st paragraph: The explanation is unclear: "On the other hand, if the events were already consumed and MirrorMaker can't produce them due to network partition, there is always a risk that these events will accidentally get lost by MirrorMaker."

The sentence contradict page 170 (1st paragraph in "Apache Kafka's MirrorMaker"): "Each consumer consumes events from the topics and partitions it was assigned on the source cluster. Every 60 seconds (by default), the consumer will tell the producer to send all the events it has to Kafka and wait until Kafka acknowledges these events. Then the consumers contact the source Kafka cluster to commit the offsets for those events. This guarantees no data loss (messages are acknowledged by Kafka before offsets are committed to the source) and there is no more than 60 seconds' worth of duplicates if the MirrorMaker process crashes."

2ed paragraph:

"Consumers take a significant performance hit when connecting to Kafka with SSL encryption---much more so then producers."

Why? It's unclear why SSL should be more expensive for consumers than producers.

Note from the Author or Editor:
There are two questions here:
1. Why do we recommend a practice to minimize data loss when we said there shouldn't be any? I think this is fine to leave as is - minimizing risk is good even if theoretically there's no risk.

2. Request to clarify why SSL is more expensive for consumers. That makes sense. Lets replace:
"Consumers take a significant performance hit when connecting to Kafka with SSL encryption---much more so then producers."
with:
"Consumers take a significant performance hit when connecting to Kafka with SSL encryption---much more so then producers. This is because use of SSL requires copying data for encryption, which means consumers no longer enjoy the performance benefits of the usual zero-copy optimization"

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 174
Figure 8-7

Both clusters say "Production Kafka Cluster" -- should right hand side cluster say "Mirror/Backup Kafka Cluster" instead?

Note from the Author or Editor:
Oops, yes.
In figure 8-7, the left side box is "Production Kafka Cluster" and the right side should be "Failover Kafka Cluster".

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 174
First bullet point

Reported lag should be 5 instead of 4.

The committed offset is the next offset to be consumed. Thus, if offset 3 is committed, messages from offset 3 to 7 needs to be mirrored, and thus lag is 5.

Note from the Author or Editor:
Good catch!
We need to replace:
"In the diagram, the real lag is 2, but the
kafka-consumer-groups tool will report a lag of 4 because MirrorMaker
hasn’t committed offsets for more recent messages yet."
With:
"In the diagram, the real lag is 2, but the
kafka-consumer-groups tool will report a lag of 5 because MirrorMaker
hasn’t committed offsets for more recent messages yet."

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 177
Last paragraph

There are two occurrences of "fetch.max.wait"
It should probably be
"fetch.max.wait.ms"

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 178/179
Headlines

First headline says: "Uber uReplicator"
Second headline says: "Confluent's Replicator"

For consistency, it should be "Uber's uReplicator" and "Confluent's Replicator" or "Uber uReplicator" and "Confluent Replicator".

Note from the Author or Editor:
Yes, this is correct. I think the common usage is "Uber uReplicator" and "Confluent Replicator".

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 183
Example Code

First example code snippet one has no #. The second example show the #.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 184
Section Deleting a Topic

The section does not mention that topic deletion is an async operation. Might be good to clarify and discuss the impact.

Note from the Author or Editor:
Added additional text:

Topic deletion is an asynchronous operation. This means that the above command will mark a topic for deletion, but the deletion will not happen immediately. The controller will notify the brokers of the pending deletion as soon as possible (after existing controller tasks complete), and the brokers will then invalidate the metadata for the topic and delete the files from disk. It is highly recommended that operators not delete more than one or two topics at a time, and give those ample time to complete before deleting other topics, due to limitations in the way the controller executes these operations.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 186
last paragraph

the '--bootstrap-server' parameter given in the example command to list new consumer groups is wrong. There is no path after the host:port and should be the following

# kafka-consumer-groups.sh --new-consumer --bootstrap-server kafka1.example.com:9092 --list kafka-python-test

In addition, the '--new-consumer' option can now be removed (the script gives a deprecation message about not needing to specify '--new-consumer' when '--bootstrap-server' is given)

Note from the Author or Editor:
Thank you, the argument has been corrected. While --new-consumer is not required in the latest versions, it is required in previous versions.

Jeffrey 'jf' Lim  Jan 01, 2018  Mar 30, 2018
Printed
Page 186
1st paragraph

last sentence: "... for produce[r] or consume[r] clients."

Missing 'r' (two times).

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 187/188
Table 9-1

Entry:

- log-end-offset: I think that the log-end-offset is one larger. If a topic is empty, log-end-offset should be zero. If there is five produced messages (from offsets 0 to 4) log-end-offset should be 5, but not "the offset of the last message produced" that would be 4.

Question:

- owner: is this the same as `client.id`? If not, what does "provided by the group member" mean?

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 189
2nd paragraph

In the output format string

"/consumers/GROUPNAME/offsets/topic/TOPICNAME/PARTITIONID-0:OFFSET"

the final part should probably be

"PARTITIONID:OFFSET"

(withoud "-0") since it actually a format string.

Note from the Author or Editor:
Corrected this typo.

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 192
Table 9-3. The configurations (keys) for clients

In description of "producer_bytes_rate" the description should be "... that a single client ID is allowed" insteand "that a singe client ID is allowed"

Note from the Author or Editor:
Thank you. This has been corrected.

adrien ruffie  Feb 10, 2018  Mar 30, 2018
Printed
Page 194
1st paragraph

Does not explain what "ideal leader" is.

Also, "This can be manually ordered using the kafka-preferred-replica-election.sh": unclear to what "this" refers?

Does not explain which partition was elected as leader if the tool is used (the first in the list).

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 195-198
Script example

The kafka-reassign-partitions.sh example is not very illustrative since it involves reassigning partitions for a single topic named my-topic with 16 partitions each replicated over brokers 0 and 1. The target broker list also consists of brokers 0 and 1. As a consequence the current partition assignment and proposed partition reassignment are exactly the same.

Note from the Author or Editor:
The target broker IDs have been updated to be 2 and 3.

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 195
Example

Use an example with less partitions -- the output is not readable and quite noisy.

The example also seems to work on a 2 node cluster thus the new assignment is the same as the old. That's confusing.

It's unclear how a single partition could be moved. How does reassignment work for a 5 partitions topic if the target broker list is only 2 brokers?

It's also unclear how the tool computers the recommendation assignment.

Note from the Author or Editor:
Agreed that this output could be clearer. However, this is more work than we can reasonably update in an errata, and should wait for the next edition.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 197
Example

Seems that the tools also write the old assignment to stdout -- this should be mentioned explicitly.

Why is step 1 required? Seems that one could write the json file manually.

Note from the Author or Editor:
It's already noted that the old assignment is written to standard output. Updated the text to make it clearer that a manual partition assignment can be created.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 198
Paragraph "Changing Replication Factor"

Why is this not possible via config changes? Some more details would be nice.

Also, why is changing replication factor and changing number of partitions "split" -- both seem to relate to each other and the same tool can be used to do both. (instead, add/removing partitions is covered earlier in p183)

"This can be done by creating a JSON object with the format used in the execution step of partition assignment that adds or removes replicas to set the replication factor." -- This sentence is confusing.

Example: `"replicas": [1,2]`
- What if those brokers don't exist?
- It's unclear from the example that one specifies a list of brokers to host a replica.

I general, the term "replica" should be explained in the beginning of this chapter.

Note from the Author or Editor:
Added some text to make this more clear.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 200
Examples

First example starts with output

```
Dumping xxx.log
Starting offset: xxx
```

Second example does not. Why? Typo or tool inconsistency?

Second example magically print the message. How can it do this without deserializer? What is `payload` -- key? value? Both? If only one, how do I get the other?

Note from the Author or Editor:
The output is as produced by the tool, so the tool is inconsistent. The message data is raw bytes, without keys.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 201
Example (Replica Verification)

It is unclear how to read the output.

How would an error be reported? If there is a miss match, what should be done? Similar for index check; how to repair the index?

Note from the Author or Editor:
Agreed that this output could be clearer. However, this is more work than we can reasonably update in an errata, and should wait for the next edition.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 202
Last paragraph

The paragraph mentions options --new-consumer and --broker-list.
The former is no longer required as of Kafka 0.10.1.
The latter is actually called --bootstrap-server and allows to specify the Kafka broker to connect to.

Note from the Author or Editor:
"broker-list" has been fixed to "bootstrap-server". The "new-consumer" flag is left, however, as it is required for previous versions.

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 202-205
Section Consuming and Producing

Console consumer and producer are used in chapters before. Why are the introduced late in the book? This is confusing.

Page 202/203: Console Consumer:

"The required options are described in the following paragraphs." -- Options for what?

Example (page 203):
- Unclear how the tool knows how to deserialize the raw bytes. Also, mention, that by default, only the value is printed.
- Explain that one need to terminate the consumer.
- Explain Auto-Offset behavior.
- Formatting of CONFIGFILE, KEY, VALUE not unique:
--> `CONFIGFILE` in italics vs `_CONFIGFILE_` with starting/ending underscore (same for KEY and VALUE)

Page 205: Console Producer:
- mention/explain default serializer
- explain how to send "end of file" (for beginners): CRTL+D

Note from the Author or Editor:
Serialization - it's explained on p202 at the start of the section that it outputs raw bytes by default

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 203
3rd paragraph

In

"--consumer.config CONFIGFILE, where _CONFIGFILE_ is the full ..."

and


"--consumer-property KEY=VALUE, where _KEY_ is the configuration option name and _VALUE_ is the value ..."

character '_' is used inconsistently.

Note from the Author or Editor:
The inconsistent formatting (the _ should have been picked up as a formatting directive) has been fixed.

Paolo Baronti  Dec 01, 2017  Mar 30, 2018
Printed
Page 205
First paragraph and command line example

In the paragraph the "kafka.coordinator.GroupMetadataManager$OffsetsMessageFormatter"
is quoted, but first:
the escape character '\' must be used like:

kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter


And then the class package is no longer valid (for new Kafka releases) is now:

kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter

In addition 'zookeeper' parameter is no longer valid, I propose to replace it with 'bootstrap-server' for example:

bin/kafka-console-consumer.sh --topic __consumer_offsets --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --consumer.config ./config/consumer.properties

Note from the Author or Editor:
The example uses single quotes, which does not require escaping the dollar sign. I have added a note that the class name is different in 0.11. The option for using --bootstrap-server vs. --zookeeper is covered elsewhere in the chapter. As of 0.11, --zookeeper is supported, and this book does not cover Kafka 1.0.

Adrien Ruffie  Feb 11, 2018  Mar 30, 2018
Printed
Page 211
2nd paragraph

“Hardcode kafka developer” should probably read “hardcore kafka developer”

Note from the Author or Editor:
The typo has been corrected

Olly Butterfield  Dec 07, 2017  Mar 30, 2018
PDF
Page 216
2nd Bulleted list

The bulleted list references 4 metrics:

Partition count
Leader partition count
All topics bytes in rate
All topics messages in rate

The following lines recite:

"Examine these metrics. In a perfectly balanced cluster, the numbers will be even across all brokers in the cluster, as in Table 10-2."

However Table 10-2 shows the "All topics bytes out rate" metric instead of "All topics messages in rate".

The two should be consistent.

Note from the Author or Editor:
Updated the text and the table to use all five metrics.

Paolo Baronti  Dec 04, 2017  Mar 30, 2018
PDF
Page 221
1st paragraph

The paragraph refers to "network handler threads" and "request handler threads".
Are these the same as the "network threads" and "IO threads" from page 100 ?
A consistent terminology would help.

Note from the Author or Editor:
"IO" and "request" threads are used alternately in different parts of Kafka. The language has been clarified to note this in both sections. And the language has been updated to consistently use "network threads"

Paolo Baronti  Dec 04, 2017  Mar 30, 2018
Printed
Page 224
Paragraph "All topics messages in"

"the bytes rates" -> singular: byte

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
Printed
Page 227
Table 10-11

Table formatting (ie, line breaks) may be improved. Better to break the lines at the comma instead of in the middle of `RequestQueueTimeMs`.

Matthias J Sax  Jan 13, 2019  Aug 09, 2019
PDF
Page 234
Final paragraph

In

"Depending on the number of consumers, inbound network traffic could easily become an order of magnitude larger on outbound traffic."

Shouldn't it be the other way around?

"Depending on the number of consumers, outbound network traffic could easily become an order of magnitude larger on inbound traffic."

Note from the Author or Editor:
The language, while correct, is quite awkward. This has been updated to make the intention clearer.

Paolo Baronti  Dec 04, 2017  Mar 30, 2018
PDF
Page 238
Fist item of bulleted list

In

"It has enough messages to fill a batch based on the max.partition.bytes configuration"

should the configuration property be "batch.size" instead?


Note from the Author or Editor:
The configuration name has been updated to reflect the new producer configurations.

Paolo Baronti  Dec 04, 2017  Mar 30, 2018
PDF
Page 250
3rd paragrapgh

>> This is a CONTENTIOUS and nonblocking option

should be changed to

>> This is a CONTINUOUS and nonblocking option

Note from the Author or Editor:
Indeed a typo. Stream processing isn't contentious but rather continuous.

Ashok Chilakapati  Feb 14, 2018  Mar 30, 2018
PDF
Page 252
"Mind the Time Zone" note

In

"The entire data pipeline should standardize on a single time zones;"

"zones" should be "zone"

Note from the Author or Editor:
Correct.

Paolo Baronti  Dec 06, 2017  Mar 30, 2018
PDF
Page 254
Figure 11-1

According to the text description and the materialized view on the right side of the figure, the final row of the stream of events on the left side of the figure should be labeled "Sale" and not "Return".

Note from the Author or Editor:
Right!

The in the last two rows in the left table, it looks like both "blue shoes, 300" and "green shoes, 299" are "return" events, where "blue shoes, 300" is a return but "green shoes, 299" is a sale.

Paolo Baronti  Dec 06, 2017  Mar 30, 2018
Printed
Page 255-256
third paragraph

The paragraph explains two window various: tumbling window and sliding window. Figure 11-2 shows Tumbling Window and Hopping WIndow. The Hopping Window appears first time here in this chapter. The paragraph in the previous page should explain also hopping window.

Note from the Author or Editor:
Good point.

This:
"When the advance interval is equal to the window size, this is sometimes called a tumbling window. When the window moves on every record, this is sometimes called a sliding window"

Should be:
"Windows for which the size is a fixed time interval are called `hopping window`. There are two special cases which have their own names: When the advance interval is equal to the window size, this is called a tumbling window. When the window moves on every record, this is called a sliding window."

Shin'ya Ueoka  Apr 11, 2018  Aug 09, 2019
PDF
Page 256
3rd paragraph

Such applications need to maintain state within the application because each event can be handled independently.
->
Such applications do not need to maintain state within the application because each event can be handled independently.

Note from the Author or Editor:
The feedback is accurate - should be "do not need" instead of "need".

Anonymous  Oct 25, 2017  Mar 30, 2018
Printed
Page 263
1st bullet point

"... this requires that the application examine[s] the event time and discover[s] that it is older..."

two missing "s" as highlighted

Note from the Author or Editor:
Correct. The "s" are missing as pointed out.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Printed
Page 268
last paragraph

The sentence in the last paragraphi:
we used the Gson library from Google to generate a JSon serializer and deserializer from our Java object.
"JSon" should be replaced with "JSON", as they are written as "JSON" in another pages. The official notation is also "JSON".

Shin'ya Ueoka  Apr 14, 2018  Aug 09, 2019
Printed
Page 271
Code Example

Indention is bad.

- lines 2,3,5,6,8,9 should be indented
- formatting is inconsistent
-> "KStream<...> viewsWithProfile = view.leftJoin(..." does not have line break after "=" while all other line to start a new line after "=".

Note from the Author or Editor:
Yeah, the lack of indentation makes the first part of the example difficult to read.

Lines 2,3,5,6,8,9 in the code example should be indented 4-spaces in.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
PDF
Page 278
Fraud Detection item

The "Fraud Detection" list item title is formatted differently than the other two ("Customer Service" and "Internet of Things") appearing on the previous page.

Note from the Author or Editor:
The indentation and formatting has been fixed

Paolo Baronti  Dec 08, 2017  Mar 30, 2018
Printed
Page 278
3rd paragraph

Should this paragraph, starting with "In cyber security," be its own bullet point similar to "Customer Service", "Internet of Things", and "Fraud Detection" above?

Note from the Author or Editor:
There is a double mis-formatting here.

The sub-section starting with "Fraud Detection" (currently a bullet point) should be formatted like the "Customer Service" and "Internet of things" subsections. Then the paragraph starting with "In cyber security" should be part of the "Fraud Detection" subsection.

Matthias J Sax  Dec 30, 2017  Mar 30, 2018
Mobi
Page 710
2nd paragraph of 'memory' section

This is in the kindle edition of the book -- "location 710", not page 710.

It's in the section "Chapter 4: Installing Kafka > Hardware Selection > Memory", and it looks like a placeholder value that was just not filled in later.

"Even a broker that is handing X messages per second and a data rate of X megabits per second can run with a 5GB heap."

My assumption is that those X's were intended to be replaced with numeric values of some sort :)

Note from the Author or Editor:
Thanks for the note! I've added the values that should have been there in an upcoming revision (committed)

Joshua Barratt  Sep 16, 2017  Mar 30, 2018