2.3. Match One of Many Characters
Problem
Create one regular expression to match all common misspellings of calendar
, so you can
find this word in a document without having to trust the author’s
spelling ability. Allow an a
or e
to be used in each of the vowel
positions. Create another regular expression to match a single
hexadecimal character. Create a third regex to match a single
character that is not a hexadecimal character.
Solution
Calendar with misspellings
c[ae]l[ae]nd[ae]r
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Hexadecimal character
[a-fA-F0-9]
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonhexadecimal character
[^a-fA-F0-9]
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The notation using square brackets is called a
character class. A character class matches a single character out of a
list of possible characters. The three classes in the first regex
match either an a
or an e
. They do so independently. When you
test calendar
against this regex, the first
character class matches a
, the second e
, and the third a
.
Outside character classes, a dozen punctuation characters are
metacharacters. Inside a character class, only four characters have a
special function: \
, ^
, -
, and
]
. If you’re using Java or .NET,
the opening bracket [
is also a metacharacter inside character classes. All other characters are literals and simply add themselves to the character class. ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.