Regex Metacharacters, Modes, and Constructs
The metacharacters and metasequences shown here represent most available types of regular expression constructs and their most common syntax. However, syntax and availability vary by implementation.
Character representations
Many implementations provide shortcuts to represent characters that may be difficult to input. (See MRE 115–118.)
- Character shorthands
Most implementations have specific shorthands for the
alert
,backspace
,escape character
,form feed
,newline
,carriage return
,horizontal tab
, andvertical tab
characters. For example,\n
is often a shorthand for the newline character, which is usually LF (012 octal), but can sometimes be CR (015 octal), depending on the operating system. Confusingly, many implementations use\b
to mean bothbackspace
and word boundary (position between a “word” character and a nonword character). For these implementations,\b
meansbackspace
in a character class (a set of possible characters to match in the string), and word boundary elsewhere.- Octal escape:
\num
Represents a character corresponding to a two- or three-digit octal number. For example,
\015\012
matches an ASCII CR/LF sequence.- Hex and Unicode escapes:
\x
num
,\x{
num}
,\u
num
,\U
num
Represent characters corresponding to hexadecimal numbers. Four-digit and larger hex numbers can represent the range of Unicode characters. For example,
\x0D\x0A
matches an ASCII CR/LF sequence.- Control characters:
\c
char
Corresponds to ASCII control characters encoded ...
Get Regular Expression Pocket Reference, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.