Grapheme Clusters

Unicode 3.2 introduces a new concept called the “grapheme cluster.” Actually, the concept isn't all that new; Unicode 3.2 merely formalizes a concept that was already out there, nailing down a more specific definition and some related character properties and giving it a new name.

A grapheme cluster is a sequence of one or more Unicode code points that should be treated as a single unit by various processes:

  • Text-editing software should generally allow placement of the cursor only at grapheme cluster boundaries. Clicking the mouse on a piece of text should place the insertion point at the nearest grapheme cluster boundary, and the arrow keys should move forward and back one grapheme cluster at a time.

  • Text-rendering software ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.