5.4. Find All Except a Specific Word
Problem
You want to use a regular expression to match any complete
word except cat
. Catwoman
, vindicate
, and other words that
merely contain the letters “cat” should be matched—just not cat
.
Solution
A negative lookahead can help you rule out specific words, and is key to this next regex:
\b(?!cat\b)\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Although a negated character class (written as ‹[^⋯]
›) makes it easy to match anything
except a specific character, you can’t just write ‹[^cat]
› to match anything except
the word cat
.
‹[^cat]
› is a valid regex,
but it matches any character except c
, a
, or t
. Hence, although ‹\b[^cat]+\b
› would avoid matching
the word cat
,
it wouldn’t match the word time
either, because it contains the
forbidden letter t
. The regular expression ‹\b[^c][^a][^t]\w*
› is no good
either, because it would reject any word with c
as its first letter, a
as its second letter,
or t
as its
third. Furthermore, that doesn’t restrict the first three letters to
word characters, and it only matches words with at least three
characters since none of the negated character classes are
optional.
With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:
\b # Assert position at a word boundary. (?! # Not followed by: cat # Match "cat". \b # Assert position at a word boundary. ) # End the negative lookahead. \w+ ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.