2.6. Match Whole Words
Problem
Create a regex that matches cat
in My cat is brown
, but not in category
or bobcat
. Create another
regex that matches cat
in staccato
, but not in any of the three
previous subject strings.
Solution
Word boundaries
\bcat\b
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonboundaries
\Bcat\B
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Word boundaries
The regular expression token ‹\b
› is
called a word boundary. It matches at the start
or the end of a word. By itself, it results in a zero-length match.
‹\b
› is an
anchor, just like the tokens introduced in the
previous section.
Strictly speaking, ‹\b
› matches in these three positions:
Before the first character in the subject, if the first character is a word character
After the last character in the subject, if the last character is a word character
Between two characters in the subject, where one is a word character and the other is not a word character
To run a “whole words only” search using a regular expression,
simply place the word between two word boundaries, as we did with
‹\bcat\b
›. The first
‹\b
› requires the
‹c
› to occur at the very
start of the string, or after a nonword character. The second ‹\b
› requires the ‹t
› to occur at the very end of
the string, or before a nonword character.
Line break characters are nonword characters. ‹\b
› will match after a line break if the line break is immediately followed by a word character. ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.