2.6. Match Whole Words
Problem
Create a regex that matches cat
in My cat is brown
, but not in category
or bobcat
. Create another
regex that matches cat
in staccato
, but not in any of the three
previous subject strings.
Solution
Word boundaries
\bcat\b
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonboundaries
\Bcat\B
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Word boundaries
The regular expression token ‹\b
› is called a word
boundary. It matches at the start or the end of a word.
By itself, it results in a zero-length match. ‹\b
› is an
anchor, just like the tokens introduced in
the previous section.
Strictly speaking, ‹\b
› matches in these three positions:
Before the first character in the subject, if the first character is a word character
After the last character in the subject, if the last character is a word character
Between two characters in the subject, where one is a word character and the other is not a word character
None of the flavors discussed in this book have separate
tokens for matching only before or only after a word. Unless you
wanted to create a regex that consists of nothing but a word
boundary, these aren’t needed. The tokens before or after the
‹\b
› in your regular
expression will determine where ‹\b
› can match. The ‹\b
› in ‹\bx
› and ‹!\b
› could match only at the start of a word.
The ‹\b
› in ‹x\b
› and ‹\b!
› could match only at the
end of a word. ‹x\bx
›
and ‹!\b!
› can never match anywhere. ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.