Errata

Errata for Unicode Explained

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed	Page 14 end of 2nd paragraph	currently it says: "image; as so many compromises, it combines" and it should likely say: "image; as with so many compromises, it combines".	Anonymous
Printed	Page 21 lines 11-12 supra	Page 21, lines 11-12 supra "... that character [tverdyj znak, hard sign] is not present in most fonts ..." This is incorrect. No Russian font would be complete without this letter. There are very common words that use this letter/character. The author might be confusing the fact that in pre-Revolutionary orthography, every word that did not end in a "soft sign" ended in a hard sign. This use of the hard sign had no meaning, and it was dropped in the post-Revolutionary reform of Russian orthography. However, this character can appear not only at the END of a word, but also WITHIN the word. In the latter cases, the character is still used and its use is mandatory.	Anonymous	Feb 03, 2017
Printed	Page 39 Table 1-2, WGL4 characters, "Classification" column, "Space characters" row;	space is mentioned as U+0040. Should that not be U+0020 instead, as U+0040 is the COMMERCIAL AT?	Anonymous
Printed	Page 47 2nd paragraph	The octet with value 33 in decimal is 00100001 in binary and not 00010001.	Anonymous
Printed	Page 67 4th paragraph (Encoded representation)	The Author says "For the @ character, the represnetation could be the octet 40 (hex) alone-i.e., the bit sequence 00001000." ... Should be bit sequence 01000000, since 40 (hex) is 64 (dec).	Anonymous	Aug 09, 2011
Printed	Page 87 1st paragraph	The book says that Alt-0151 and Alt-8211 both produce an em-dash on Windows. Actually, the second one produces an en-dash instead.	Anthony Duncan	Jun 22, 2017
Printed	Page 122 1st paragraph	0-256 should be 0-255	Anthony Duncan	Jun 22, 2017
Printed	Page 179 3rd paragraph of Surrogates section	The ranges allocated for high and low surrogates exist in the coding space, as U+D800..U+DB7F and U+DC00..U+DFFF, should be (DBFF instead of DB7F) The ranges allocated for high and low surrogates exist in the coding space, as U+D800..U+DBFF and U+DC00..U+DFFF,	Eckhard Stein	Dec 09, 2022
Printed	Page 223 middle of the page	0066 0069 not the codes for i and j	Anthony Duncan	Jun 22, 2017
Printed	Page 249 link to the "Unicode Collation Algorithm"	on page 249, there is a link to the "Unicode Collation Algorithm" http://www.unicode.org/reports/tr30/ However, the link is wrong. It should be http://www.unicode.org/reports/tr10/	Anonymous	Mar 08, 2010
PDF	Page 252 2nd paragraph	The paragraph seems to be missing the name of the language that it's referring to. I assume the intended wording was "Not all writing systems make a case distinction, even if they use letters. For example, [in language X] there is no such distinction..."	Stephen Dewey	Aug 12, 2014
Printed	Page 297 Last line of 3rd paragraph from the bottom	The last sentence of the first paragraph of "Some Properties of UTF-16" is "Since it is not a low surrogate, we can know that the previous code point is erroneous data". However the previous code point (a high surrogate) could be correct and the code point in this position could have been corrupted from a low surrogate to a normal code point. It is still true that only one character will be corrupted.	Anonymous	Apr 19, 2009
Printed	Page 297 3rd paragraph	the convertion surrogates to utf-32 supplied: u=(h-d800)400+(L-dc00)10000 (all in hex) is incorrect. For example the surrogates pair supplied earlier in the page: d835 and dc05 will result in: u=(d835 - d800) * 400 + (dc05 - dc00) * 10000 u = 35 * 400 + 5 * 10000 u = d400 + 50000 u = 5d400 while the correct answer (written in the same page earlier) is 1d405 !	Anonymous	Sep 27, 2009
Printed	Page 297 almost middle of the page	In the formula for converting a surrogate pair to a code point, it uses a multiplication sign where it ought to use a plus sign. U = (H - D800) * 400 + (L - DC00) * 10000 // wrong U = (H - D800) * 400 + L - DC00 + 10000 // working See also the following link. https://stackoverflow.com/questions/31282675/how-to-convert-surrogate-pair-to-unicode-scalar-in-swift	Anthony Duncan	Jun 22, 2017
Printed	Page 302 5	The text suggests that if data is known or expected to be in UTF-32 encoding then the byte order mark should appear as 00 00 FE FF or 00 00 FF FE. To me this seems incorrect, as I would expect the value 00 00 FE FF taken as a 32 bit number would end up as FF FE 00 00 if the byte order was swapped.	Peter Friend	Jun 12, 2009
Printed	Page 304 Table 6-3	The row describing "UTF-16LE" contains "As UTF-8, but with Little Endian byte order fixed". I believe that this was intended to read "As UTF-16...".	Anonymous
Printed	Page 305 4th paragraph	There is an unmatched ")" after UTF-16. Either it should be removed or perhaps a matching "(" should be placed in front of the previous "as".	Anonymous	Apr 19, 2009
Printed	Page 317 on Page 317, the table 6-4	In Chapter 6, Section "Auto-Detecting the Encoding", on Page 317, the table 6-4 "Heuristics for detecting Unicode encoding" states that the Byte Order Mark in UTF-32LE encoding would be UTF-32LE 00 00 FF FE (wrong) This is not correct. It should rather be: UTF-32LE FE FF 00 00 (correct) This can also be verified by looking at the official Unicode site, at address http://unicode.org/faq/utf_bom.html#bom4	Anonymous	Mar 12, 2010
Printed	Page 317 Table 6-3	If Wikipedia is to be believed, the octets listed for UTF-EBCDIC are wrong, though I haven't had opportunity to test it. https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding DD 73 73 73 // probably wrong DD 73 66 73 // probably right Then, this would look like Ýsfs.	Anthony Duncan	Jun 22, 2017
Printed	Page 385 second to last paragraph	What's an "APL quote"?	Anonymous	Jul 03, 2017