Book description
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. There are hundreds of different encoding systems for mapping characters to numbers, but Unicode promises a single mapping. Unicode enables a single software product or website to be targeted across multiple platforms, languages and countries without re-engineering. It's no wonder that industry giants like Apple, Hewlett-Packard, IBM andMicrosoft have all adopted Unicode.
Containing everything you need to understand Unicode, this comprehensive reference from O'Reilly takes you on a detailed guide through the complex character world. For starters, it explains how to identify and classify characters - whether they're common, uncommon, or exotic. It then shows you how to type them, utilize their properties, and process character data in a robust manner.
The book is broken up into three distinct parts. The first few chapters provide you with a tutorial presentation of Unicode and character data. It gives you a firm grasp of the terminology you need to reference various components, including character sets, fonts and encodings, glyphs and character repertoires.
The middle section offers more detailed information about using Unicode and other character codes. It explains the principles and methods of defining character codes, describes some of the widely used codes, and presents code conversion techniques. It also discusses properties of characters, collation and sorting, line breaking rules and Unicode encodings. The final four chapters cover more advanced material, such as programming to support Unicode.
You simply can't afford to be without the nuggets of valuable information detailed in Unicode Explained.
Table of contents
- Table of Contents
- Preface
-
Part I. Working with Characters
-
Chapter 1. Characters as Data
- Introduction to Characters and Unicode
- What’s in a Character? (1/5)
- What’s in a Character? (2/5)
- What’s in a Character? (3/5)
- What’s in a Character? (4/5)
-
What’s in a Character? (5/5)
- Why Do We Need to Know About Characters?
- Characters as Units of Text
- Characters Versus Images
- Processing of Characters
- Giving Identity to Characters
- Unicode Definitions of Characters
- Definitions of Characters Elsewhere
- What’s in a Name?
- Should We Be Strict About the Meanings of Characters?
- Ambiguity Among Characters
- How Do I Find My Character?
- Which Characters Does Each Language Use?
- Variation of Writing Systems
- Glyphs and Fonts (1/2)
- Glyphs and Fonts (2/2)
- Definitions of Character Repertoires
- Numbering Characters
- Encoding Characters as Octet Sequences (1/2)
- Encoding Characters as Octet Sequences (2/2)
- Working with Encodings (1/2)
- Working with Encodings (2/2)
- Working with Fonts (1/2)
- Working with Fonts (2/2)
- Summaries
-
Chapter 2. Writing Characters
- Method Varieties
- Keyboard Variation and Settings
- Virtual Keyboards
- Program Commands (1/2)
- Program Commands (2/2)
- Character Maps
- Replacements on the Fly (1/2)
- Replacements on the Fly (2/2)
- Special Techniques
- Escape Sequences (1/2)
- Escape Sequences (2/2)
- Specialized Editors
- Exercise
- Chapter 3. Character Sets and Encodings
-
Chapter 1. Characters as Data
-
Part II. A Systematic Look at Unicode
-
Chapter 4. The Structure of Unicode
- Design Principles
- Versions of Unicode
- Coding Space (1/3)
- Coding Space (2/3)
- Coding Space (3/3)
- Unicode Terms
- Guide to the Unicode Standard (1/2)
- Guide to the Unicode Standard (2/2)
- Unicode and Fonts
- Criticism of Unicode (1/2)
- Criticism of Unicode (2/2)
- Questions and Answers
-
Chapter 5. Properties of Characters
- Character Classification
- An Overview of Properties (1/3)
- An Overview of Properties (2/3)
- An Overview of Properties (3/3)
- Compositions and Decompositions (1/3)
- Compositions and Decompositions (2/3)
- Compositions and Decompositions (3/3)
- Normalization (1/2)
- Normalization (2/2)
- Case Properties
- Collation and Sorting (1/2)
- Collation and Sorting (2/2)
- Text Boundaries
- Directionality (1/2)
- Directionality (2/2)
- Line-Breaking Properties (1/4)
- Line-Breaking Properties (2/4)
- Line-Breaking Properties (3/4)
- Line-Breaking Properties (4/4)
- Unicode Conformance Requirements (1/2)
- Unicode Conformance Requirements (2/2)
- Effects on Choosing Characters
-
Chapter 6. Unicode Encodings
- Unicode Encodings in General
- UTF-32 and UCS-4
- UTF-16 and UCS-2
- UTF-8
- Byte Order
- Conversions Between Unicode Encodings
- Other Encodings (1/3)
- Other Encodings (2/3)
- Other Encodings (3/3)
- Auto-Detecting the Encoding
- Choosing an Encoding
-
Chapter 4. The Structure of Unicode
-
Part III. Advanced Unicode Topics
-
Chapter 7. Characters and Languages
- Writing Systems and IT
- Character Requirements of Languages (1/3)
- Character Requirements of Languages (2/3)
- Character Requirements of Languages (3/3)
- Transliteration and Transcription (1/2)
- Transliteration and Transcription (2/2)
- Language Metadata (1/2)
- Language Metadata (2/2)
- Languages and Fonts
-
Chapter 8. Character Usage
- Basics of Character Usage
- ASCII (Basic Latin) (1/4)
- ASCII (Basic Latin) (2/4)
- ASCII (Basic Latin) (3/4)
-
ASCII (Basic Latin) (4/4)
- Names of ASCII Characters
- Alphanumeric Characters
- Parentheses
-
Other Graphic Characters
- Ampersand & (U+0026)
- Apostrophe ' (U+0027)
- Asterisk * (U+002A)
- Circumflex accent ^ (U+005E)
- Colon : (U+003A)
- Comma , (U+002C)
- Dollar sign $ (U+0024)
- Commercial at @ (U+0040)
- Equals sign = (U+003D)
- Exclamation mark ! (U+0021)
- Full stop “.” (U+002E)
- Grave accent ` (U+0060)
- Greater-than sign > (U+003E)
- Hyphen-minus “-” (U+002D)
- Less-than sign < (U+003C)
- Low line _ (U+005F)
- Number sign # (U+0023)
- Percent sign % (U+0025)
- Plus sign + (U+002B)
- Question mark ? (U+003F)
- Quotation mark " (U+0022)
- Reverse solidus \ (U+005C)
- Semicolon ; (U+003B)
- Solidus / (U+002F)
- Space “ ” (U+0020)
- Tilde ~ (U+007E)
- Vertical line | (U+007C)
- ASCII Control Characters (C0 Controls)
- Latin-1 Supplement (ISO 8859-1) (1/2)
- Latin-1 Supplement (ISO 8859-1) (2/2)
- Other Latin Letters
- Other European Alphabetic Scripts
- Diacritic Marks (1/2)
- Diacritic Marks (2/2)
- Letterlike Symbols
- General Punctuation (1/3)
- General Punctuation (2/3)
- General Punctuation (3/3)
- Line Structure Control
- Mathematical and Technical Symbols (1/2)
- Mathematical and Technical Symbols (2/2)
- Other Blocks (1/2)
- Other Blocks (2/2)
-
Chapter 9. The Character Level and Above
-
Levels of Text Representation and Processing
- Plain Text, Rich Text, and Markup
- Example: Nonbreaking Hyphen
- Example: Formatting in Word Processing
- Example: HTML Markup and CSS
- Linear Text Versus Mathematical Notations
- Unicode and Mathematics
- Characters Outside the Repertoire
- Selecting the Appropriate Level of Expression
- Subscripts and Superscripts
- Characters and Accessibility
- Characters and Markup (1/4)
- Characters and Markup (2/4)
- Characters and Markup (3/4)
- Characters and Markup (4/4)
- Media Types for Text (1/2)
- Media Types for Text (2/2)
-
Levels of Text Representation and Processing
-
Chapter 10. Characters in Internet Protocols
- Information About Encoding
- Characters in MIME (1/5)
- Characters in MIME (2/5)
- Characters in MIME (3/5)
- Characters in MIME (4/5)
-
Characters in MIME (5/5)
- Media Types
- Character Encoding (“charset”) Information
- MIME Headers
- Troubleshooting Examples
-
Character Encoding on the Web
- Headers in HTTP
- Specifying the encoding in HTTP headers
- Which encodings can be used?
- HTTP versus HTML
- Checking the HTTP headers
- Server configuration
- Using a meta tag
- Resolution of conflicts
- The effect of XHTML
- Heuristics of detecting encoding
- Which encoding should I use?
- Avoiding the encoding problem
- The “Unicode Encoded” logo
- Content Negotiation and Multilingual Sites (1/3)
- Content Negotiation and Multilingual Sites (2/3)
- Content Negotiation and Multilingual Sites (3/3)
- Characters in Protocol Headers
- Characters in Domain Names and URLs
-
Chapter 11. Characters in Programming
- Characters in Computer Languages
- Character and String Data (1/5)
- Character and String Data (2/5)
- Character and String Data (3/5)
- Character and String Data (4/5)
- Character and String Data (5/5)
- The Preparedness Principle (1/2)
- The Preparedness Principle (2/2)
- Character Input and Output (1/2)
- Character Input and Output (2/2)
- Processing Form Data
- Identifiers, Patterns, and Regular Expressions (1/4)
- Identifiers, Patterns, and Regular Expressions (2/4)
- Identifiers, Patterns, and Regular Expressions (3/4)
- Identifiers, Patterns, and Regular Expressions (4/4)
- International Components for Unicode (ICU)
- Using Locales (1/3)
- Using Locales (2/3)
- Using Locales (3/3)
-
Chapter 7. Characters and Languages
- Appendix. Tables for Writing Characters (1/4)
- Appendix. Tables for Writing Characters (2/4)
- Appendix. Tables for Writing Characters (3/4)
- Appendix. Tables for Writing Characters (4/4)
- Index (1/6)
- Index (2/6)
- Index (3/6)
- Index (4/6)
- Index (5/6)
- Index (6/6)
Product information
- Title: Unicode Explained
- Author(s):
- Release date: June 2006
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596101213
You might also like
book
Unicode Demystified
"Rich has a clear, colloquial style that allows him to make even complex Unicode matters understandable. …
book
Fonts & Encodings
This reference is a fascinating and complete guide to using fonts and typography on the Web …
book
Java™ Data Objects
Java Data Objects is a standardized Java API for object persistence. It facilitates the storage and …
book
Core Java™ Data Objects
The experienced Java developer's guide to persistence with JDO! Master JDO, the breakthrough technology for persistenting …