2.3. Match One of Many Characters
Problem
Create one regular expression to match all common
misspellings of calendar
, so you can find this word in a
document without having to trust the author’s spelling ability. Allow an
a
or
e
to be used
in each of the vowel positions. Create another regular expression to
match a single hexadecimal character. Create a third regex to match a
single character that is not a hexadecimal character.
The problems in this recipe are used to explain an important and commonly used regex construct called a character class.
Solution
Calendar with misspellings
c[ae]l[ae]nd[ae]r
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Hexadecimal character
[a-fA-F0-9]
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonhexadecimal character
[^a-fA-F0-9]
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The notation using square brackets is called a
character class. A character class matches a
single character out of a list of possible characters. The three classes
in the first regex match either an a
or an e
. They do so independently.
When you test calendar
against this regex, the first
character class matches a
, the second e
, and the third a
.
Inside a character class, only four characters have a special
function: \
, ^
, -
, and ]
. If you’re using Java or .NET, the opening
bracket [
is also a
metacharacter inside character classes.
A backslash always escapes the character ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.