Errata

Unicode Explained

Errata for Unicode Explained

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 14
end of 2nd paragraph

currently it says: "image; as so many compromises, it combines" and it should likely say: "image; as with so many compromises, it combines".

Anonymous   
Printed Page 21
lines 11-12 supra

Page 21, lines 11-12 supra

"... that character [tverdyj znak, hard sign] is not present in most fonts ..."
This is incorrect. No Russian font would be complete without this
letter. There are very common words that use this letter/character.
The author might be confusing the fact that in pre-Revolutionary
orthography, every word that did not end in a "soft sign" ended in a
hard sign. This use of the hard sign had no meaning, and it was
dropped in the post-Revolutionary reform of Russian orthography.
However, this character can appear not only at the END of a word, but
also WITHIN the word. In the latter cases, the character is still used
and its use is mandatory.

Anonymous  Feb 03, 2017 
Printed Page 39
Table 1-2, WGL4 characters, "Classification" column, "Space characters" row;

space is mentioned as U+0040. Should that not be U+0020 instead, as U+0040 is the
COMMERCIAL AT?

Anonymous   
Printed Page 47
2nd paragraph

The octet with value 33 in decimal is 00100001 in binary and not 00010001.

Anonymous   
Printed Page 67
4th paragraph (Encoded representation)

The Author says "For the @ character, the represnetation could be the octet 40 (hex) alone-i.e., the bit sequence 00001000." ... Should be bit sequence 01000000, since 40 (hex) is 64 (dec).

Anonymous  Aug 09, 2011 
Printed Page 87
1st paragraph

The book says that Alt-0151 and Alt-8211 both produce an em-dash on Windows. Actually, the second one produces an en-dash instead.

Anthony Duncan  Jun 22, 2017 
Printed Page 122
1st paragraph

0-256 should be 0-255

Anthony Duncan  Jun 22, 2017 
Printed Page 179
3rd paragraph of Surrogates section

The ranges allocated for high and low surrogates exist in the coding space, as U+D800..U+DB7F
and U+DC00..U+DFFF,

should be (DBFF instead of DB7F)
The ranges allocated for high and low surrogates exist in the coding space, as U+D800..U+DBFF
and U+DC00..U+DFFF,

Eckhard Stein  Dec 09, 2022 
Printed Page 223
middle of the page

0066 0069
not the codes for i and j

Anthony Duncan  Jun 22, 2017 
Printed Page 249
link to the "Unicode Collation Algorithm"

on page 249, there is a link to the "Unicode Collation Algorithm" http://www.unicode.org/reports/*tr30*/
However, the link is wrong. It should be
http://www.unicode.org/reports/*tr10*/

Anonymous  Mar 08, 2010 
PDF Page 252
2nd paragraph

The paragraph seems to be missing the name of the language that it's referring to. I assume the intended wording was "Not all writing systems make a case distinction, even if they use letters. For example, [in language X] there is no such distinction..."

Stephen Dewey  Aug 12, 2014 
Printed Page 297
Last line of 3rd paragraph from the bottom

The last sentence of the first paragraph of "Some Properties of UTF-16" is "Since it is not a low surrogate, we can know that the previous code point is erroneous data". However the previous code point (a high surrogate) could be correct and the code point in this position could have been corrupted from a low surrogate to a normal code point.

It is still true that only one character will be corrupted.

Anonymous  Apr 19, 2009 
Printed Page 297
3rd paragraph

the convertion surrogates to utf-32 supplied:
u=(h-d800)*400+(L-dc00)*10000
(all in hex)
is incorrect.

For example the surrogates pair supplied earlier in the page: d835 and dc05 will result in:

u=(d835 - d800) * 400 + (dc05 - dc00) * 10000
u = 35 * 400 + 5 * 10000
u = d400 + 50000
u = 5d400

while the correct answer (written in the same page earlier) is 1d405 !


Anonymous  Sep 27, 2009 
Printed Page 297
almost middle of the page

In the formula for converting a surrogate pair to a code point, it uses a multiplication sign where it ought to use a plus sign.

U = (H - D800) * 400 + (L - DC00) * 10000 // wrong
U = (H - D800) * 400 + L - DC00 + 10000 // working

See also the following link.
https://stackoverflow.com/questions/31282675/how-to-convert-surrogate-pair-to-unicode-scalar-in-swift

Anthony Duncan  Jun 22, 2017 
Printed Page 302
5

The text suggests that if data is known or expected to be in UTF-32 encoding then the byte order mark should appear as 00 00 FE FF or 00 00 FF FE. To me this seems incorrect, as I would expect the value 00 00 FE FF taken as a 32 bit number would end up as FF FE 00 00 if the byte order was swapped.

Peter Friend  Jun 12, 2009 
Printed Page 304
Table 6-3

The row describing "UTF-16LE" contains "As UTF-8, but with Little Endian byte order
fixed". I believe that this was intended to read "As UTF-16...".

Anonymous   
Printed Page 305
4th paragraph

There is an unmatched ")" after UTF-16. Either it should be removed or perhaps a matching "(" should be placed in front of the previous "as".

Anonymous  Apr 19, 2009 
Printed Page 317
on Page 317, the table 6-4

In Chapter 6, Section "Auto-Detecting the Encoding", on Page 317, the table 6-4 "Heuristics for detecting Unicode encoding" states that the Byte Order
Mark in UTF-32LE encoding would be
UTF-32LE 00 00 FF FE (wrong)
This is not correct. It should rather be:
UTF-32LE FE FF 00 00 (correct)

This can also be verified by looking at the official Unicode site, at
address http://unicode.org/faq/utf_bom.html#bom4

Anonymous  Mar 12, 2010 
Printed Page 317
Table 6-3

If Wikipedia is to be believed, the octets listed for UTF-EBCDIC are wrong, though I haven't had opportunity to test it.

https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding

DD 73 73 73 // probably wrong
DD 73 66 73 // probably right

Then, this would look like Ýsfs.

Anthony Duncan  Jun 22, 2017 
Printed Page 385
second to last paragraph

What's an "APL quote"?

Anonymous  Jul 03, 2017