Cover | Table of Contents
Constant widthConstant width|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell).
|
\b |
Backspace; supported only in character class.
|
\e |
ESC character,
x1B. |
\n |
Newline;
x0A on Unix and Windows,
x0D on Mac OS 9. |
\r |
Carriage return;
x0D on Unix and Windows,
x0A on Mac OS 9. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09. |
\octal |
Character specified by a two- or three-digit octal code.
|
\xhex |
Character specified by a one- or two-digit hexadecimal code.
|
\x{hex} |
Character specified by any hexadecimal code.
|
\cchar |
Named control character.
|
\N{name} |
A named character specified in the Unicode standard or listed in
PATH_TO_PERLLIB/unicode/Names.txt. Requires
use charnames ':full'. |
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained in a listed range.
|
[^...] |
A single character not listed and not contained within a listed range.
|
[:class:] |
POSIX-style character class valid only within a regex character class.
|
|
.
|
Any character except newline (unless single-line mode,
/s). |
\C |
One byte; however, this may corrupt a Unicode character stream.
|
\X |
Base character followed by any number of Unicode combining characters.
|
\w |
Word character,
\p{IsWord}. |
\W |
Non-word character ,
\P{IsWord}. |
\d |
Digit character,
\p{IsDigit}. |
\D |
Non-digit character,
\P{IsDigit}. |
\s |
Whitespace character,
\p{IsSpace}. |
\S |
Non-whitespace character, |
java.util.regex package. Although there are
competing packages available for previous versions of Java, Sun is
poised to become the standard. Sun's package uses a
Traditional NFA match engine. For an explanation of the rules behind
a Traditional NFA engine, see Section 1.2.java.util.regex supports the
metacharacters and metasequences
listed in Table 1-10 through Table 1-14. For expanded definitions of each
metacharacter, see Section 1.2.1.|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell).
|
\b |
Backspace,
x08, supported only in character class. |
\e |
ESC character,
x1B. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09. |
\0octal |
Character specified by a one-, two-, or three-digit octal code.
|
\xhex |
Character specified by a two-digit hexadecimal code.
|
\uhex |
Unicode character specified by a four-digit hexadecimal code.
|
\cchar |
Named control character.
|
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained in a listed range.
|
[^...] |
A single character not listed and not contained within a listed
range.
|
|
.
|
Any character, except a line terminator (unless
DOTALL mode). |
\w |
Word character,
[a-zA-Z0-9_]. |
\W |
Non-word character,
[^a-zA-Z0-9_]. |
\d |
Digit,
[0-9]. |
\D |
Non-digit,
[^0-9]. |
\s |
Whitespace character,
[ \t\n\f\r\x0B]. |
\S |
Non-whitespace character,
[^ \t\n\f\r\x0B]. |
\p{prop} |
Character contained by given POSIX character class, Unicode property,
or Unicode block.
|
\P{prop} |
Character not contained by given POSIX character class, Unicode
property, or Unicode block.
|
|
Sequence
|
Meaning
|
|---|---|
^ |
Start of string, or after any newline if in
MULTILINE mode. |
\A |
Beginning of string, in any match mode.
|
$ |
End of string, or before any newline if in
|
|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell),
x07. |
\b |
Backspace,
x08, supported only in character class. |
\e |
ESC character,
x1B. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09. |
\v |
Vertical tab,
x0B. |
\0octal |
Character specified by a two-digit octal code.
|
\xhex |
Character specified by a two-digit hexadecimal code.
|
\uhex |
Character specified by a four-digit hexadecimal code.
|
\cchar |
Named control character.
|
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained within a listed range.
|
[^...] |
A single character not listed and not contained within a listed range.
|
|
.
|
Any character, except a line terminator (unless single-line mode,
s). |
\w |
Word character,
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] or
[a-zA-Z_0-9] in ECMAScript
mode. |
\W |
Non-word character,
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] or
[^a-zA-Z_0-9] in ECMAScript
mode. |
\d |
Digit,
\p{Nd} or [0-9] in
ECMAScript mode. |
\D |
Non-digit,
\P{Nd} or [^0-9] in
ECMAScript mode. |
\s |
Whitespace character,
[ \f\n\r\t\v\x85\p{Z}] or
[ \f\n\r\t\v] in ECMAScript
mode. |
\S |
Non-whitespace character,
[^ \f\n\r\t\v\x85\p{Z}]
or [^ \f\n\r\t\v] in ECMAScript
mode. |
\p{prop} |
Character contained by given Unicode block or property.
|
\P{prop} |
Character not contained by given Unicode block or property.
|
re module. The re module uses a
Traditional NFA match engine. For an explanation of the rules behind
an NFA engine, see Section 1.2.re included
with Python 2.2, although the module has been available in similar
form since Python 1.5.re module supports the metacharacters and
metasequences listed in Table 1-21 through
Table 1-25. For expanded definitions of each
metacharacter, see Section 1.2.1.|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell),
x07. |
\b |
Backspace,
x08, supported only in character class. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09. |
\v |
Vertical tab,
x0B. |
\octal |
Character specified by up to three octal digits.
|
\xhh |
Character specified by a two-digit hexadecimal code.
|
\uhhhh |
Character specified by a four-digit hexadecimal code.
|
\Uhhhhhhhh |
Character specified by an eight-digit hexadecimal code.
|
|
Class
|
Meaning
|
|---|---|
[...] |
Any character listed or contained within a listed range.
|
[^...] |
Any character that is not listed and is not contained within a listed
range.
|
|
.
|
Any character, except a newline (unless
DOTALL
mode). |
\w |
Word character,
[a-zA-z0-9_] (unless
LOCALE or UNICODE mode). |
\W |
Non-word character,
[^a-zA-z0-9_] (unless
LOCALE or UNICODE mode). |
\d |
Digit character,
[0-9]. |
\D |
Non-digit character,
[^0-9]. |
\s |
Whitespace character,
[ \t\n\r\f\v]. |
\S |
Nonwhitespace character,
[ \t\n\r\f\v]. |
|
Sequence
|
Meaning
|
|---|---|
^ |
Start of string, or after any newline if in
MULTILINE match mode. |
\A |
Start of search string, in all match modes.
|
$ |
End of search string or before a string-ending newline, or before any
newline in
MULTILINE match mode. |
\Z |
End of string or before a string-ending newline, in any match mode. |
|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell),
x07. |
\b |
Backspace,
x08, supported only in character class. |
\e |
ESC character,
x1B. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09. |
\octal |
Character specified by a three-digit octal code.
|
\xhex |
Character specified by a one- or two-digit hexadecimal code.
|
\x{hex} |
Character specified by any hexadecimal code.
|
\cchar |
Named control character.
|
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained in a listed range.
|
[^...] |
A single character not listed and not contained within a listed range.
|
[:class:] |
POSIX-style character class valid only within a regex character class.
|
|
.
|
Any character except newline (unless single-line mode,
/s). |
\C |
One byte; however, this may corrupt a Unicode character stream.
|
\w |
Word character,
[a-zA-z0-9_]. |
\W |
Non-word character,
[^a-zA-z0-9_]. |
\d |
Digit character,
[0-9]. |
\D |
Non-digit character,
[^0-9]. |
\s |
Whitespace character,
[\n\r\f\t ]. |
\S |
Non-whitespace character,
[^\n\r\f\t ]. |
|
Sequence
|
Meaning
|
|---|
preg routines. PHP also
provides POSIX-style regular expressions, but these do not offer
additional benefit in power or speed. The preg
routines use a Traditional NFA match engine. For an explanation of
the rules behind an NFA engine, see Section 1.2.|
Sequence
|
Meaning
|
|---|---|
\a |
Alert (bell),
x07. |
\b |
Backspace,
x08, supported only in character class. |
\e |
ESC character,
x1B. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\f |
Form feed,
x0C. |
\t |
Horizontal tab,
x09 |
\octal |
Character specified by a three-digit octal code.
|
\xhex |
Character specified by a one- or two-digit hexadecimal code.
|
\x{hex} |
Character specified by any hexadecimal code.
|
\cchar |
Named control character.
|
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained within a listed range.
|
[^...] |
A single character not listed and not contained within a listed range.
|
[:class:] |
POSIX-style character class valid only within a regex character class.
|
|
.
|
Any character except newline (unless single-line
mode,
/s). |
\C |
One byte; however, this may corrupt a Unicode character stream.
|
\w |
Word character,
[a-zA-z0-9_]. |
\W |
Non-word character,
[^a-zA-z0-9_]. |
\d |
Digit character,
[0-9]. |
\D |
Non-digit character,
[^0-9]. |
\s |
Whitespace character,
[\n\r\f\t ]. |
\S |
Non-whitespace character,
[^\n\r\f\t ]. |
|
Sequence
|
Meaning
|
|---|---|
^ |
Start of string, or after any newline if in multiline match mode,
/m. |
\A |
Start of search string, in all match modes.
|
$ |
End of search string or before a string-ending newline, or before any
newline if in multiline match mode, |
|
Sequence
|
Meaning
|
|---|---|
|
Vim only
| |
\b |
Backspace,
x08. |
\e |
ESC character,
x1B. |
\n |
Newline,
x0A. |
\r |
Carriage return,
x0D. |
\t |
Horizontal tab,
x09. |
|
Class
|
Meaning
|
|---|---|
[...] |
Any character listed or contained within a listed range.
|
[^...] |
Any character that is not listed or contained within a listed range.
|
[:class:] |
POSIX-style character class valid only within a character class.
|
|
.
|
Any character except newline (unless
/s mode). |
|
Vim only
| |
\w |
Word character,
[a-zA-z0-9_]. |
\W |
Non-word character,
[^a-zA-z0-9_]. |
\a |
Letter character,
[a-zA-z]. |
\A |
Non-letter character,
[^a-zA-z]. |
\h |
Head of word character,
[a-zA-z_]. |
\H |
Not the head of a word character,
[^a-zA-z_]. |
\d |
Digit character,
[0-9]. |
\D |
Non-digit character,
[^0-9]. |
\s |
Whitespace character,
[ \t]. |
\S |
Non-whitespace character,
[^ \t]. |
\x |
Hex digit,
[a-fA-F0-9]. |
\X |
Non-hex digit,
[^a-fA-F0-9]. |
\o |
Octal digit,
[0-7]. |
\O |
Non-octal digit,
[^0-7]. |
\l |
Lowercase letter,
[a-z]. |
\L |
Non-lowercase letter,
[^a-z]. |
\u |
Uppercase letter,
[A-Z]. |
\U |
Non-uppercase letter,
[^A-Z]. |
\i |
Identifier character defined by
isident. |
\I |
Any non-digit identifier character.
|
\k |
Keyword character defined by
iskeyword, often set
by language modes. |
\K |
Any non-digit keyword character.
|
\f |
Filename character defined by
isfname. Operating
system dependent. |
\F |
Any non-digit filename character.
|
\p |
Printable character defined by
isprint, usually
x20-x7E. |
\P |
Any non-digit printable character.
|
|
Sequence
|
Meaning
|
|---|---|
\0 |
Null character,
\x00. |
\b |
Backspace,
\x08, supported only in character class. |
\n |
Newline,
\x0A. |
\r |
Carriage return,
\x0D. |
\f |
Form feed,
\x0C. |
\t |
Horizontal tab,
\x09. |
\t |
Vertical tab,
\x0B. |
\xhh |
Character specified by a two-digit hexadecimal code.
|
\uhhhh |
Character specified by a four-digit hexadecimal code.
|
\cchar |
Named control character.
|
|
Class
|
Meaning
|
|---|---|
[...] |
A single character listed or contained within a listed range.
|
[^...] |
A single character not listed and not contained within a listed range.
|
|
.
|
Any character except a line terminator,
[^\x0A\x0D\u2028\u2029]. |
\w |
Word character,
[a-zA-Z0-9_]. |
\W |
Non-word character,
[^a-zA-Z0-9_]. |
\d |
Digit character,
[0-9]. |
\D |
Non-digit character,
[^0-9]. |
\s |
Whitespace character.
|
\S |
Non-whitespace character.
|
|
Sequence
|
Meaning
|
|---|---|
^ |
Start of string, or after any newline if in multiline match mode,
/m. |
$ |
End of search string or before a string-ending newline, or before any
newline if in multiline match mode,
/m. |
\b |
Word boundary.
|
\B |
Not-word-boundary.
|
(?=...) |
Positive lookahead.
|
(?!...) |
Negative lookahead.
|
|
Modifier
|
Meaning
|
|---|---|
|
Sequence
|
Meaning
|
Tool
|
|---|---|---|
\a |
Alert (bell).
|
awk, sed
|
\b |
Backspace; supported only in character class.
|
awk
|
\f |
Form feed.
|
awk, sed
|
\n |
Newline (line feed).
|
awk, sed
|
\r |
Carriage return.
|
awk, sed
|
\t |
Horizontal tab.
|
awk, sed
|
\v |
Vertical tab.
|
awk, sed
|
\ooctal |
A character specified by a one-, two-, or three-digit octal code.
|
sed
|
\octal |
A character specified by a one-, two-, or three-digit octal code.
|
awk
|
\xhex |
A character specified by a two-digit hexadecimal code.
|
awk, sed
|
\ddecimal |
A character specified by a one, two, or three decimal code.
|
awk, sed
|
\cchar |
A named control character (e.g.,
\cC is Control-C). |
awk, sed
|
\b |
Backspace.
|
awk
|
\metacharacter |
Escape the metacharacter so that it literally represents itself.
|
awk, sed, egrep
|
|
Class
|
Meaning
|
Tool
|
|---|---|---|
[...] |
Matches any single character listed or contained within a listed
range.
|
awk, sed, egrep
|
[^...] |
Matches any single character that is not listed or contained within a
listed range.
|
awk, sed, egrep
|
|
.
|
Matches any single character, except newline.
|
awk, sed, egrep
|
\w |
Matches an ASCII word character,
[a-zA-Z0-9_]. |
egrep, sed
|
\W |
Matches a character that is not an ASCII word character,
[^a-zA-Z0-9_]. |
egrep, sed
|
[:prop:] |
Matches any character in the POSIX character class.
|
awk, sed |
Return to Regular Expression Pocket Reference