2.17. Match One of Two Alternatives Based on a Condition
Problem
Create a regular expression that matches a comma-delimited
list of the words one
, two
, and three
. Each word can occur any number of
times in the list, and the words can occur in any order, but each word
must appear at least once.
Solution
\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}(?(1)|(?!))(?(2)|(?!))(?(3)|(?!))
Regex options: None |
Regex flavors: .NET, PCRE, Perl, Python |
Java, JavaScript, and Ruby do not support conditionals. When programming in these languages (or any other language), you can use the regular expression without the conditionals, and write some extra code to check if each of the three capturing groups matched something.
\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
.NET, PCRE, Perl, and Python support conditionals using numbered
capturing groups. ‹(?(1)
›
is a conditional that checks whether the first capturing group has
already matched something. If it has, the regex engine attempts to match
‹then
|else
)
›. If the capturing
group has not participated in the match attempt thus far, the ‹then
›
part is attempted.else
The parentheses, question mark, and vertical bar are all
part of the syntax for the conditional. They don’t have their usual
meaning. You can use any kind of regular expression for the ‹
›
and ‹then
› parts. The only restriction is that if you want to use alternation for one of the parts, you have to use a group ...else
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.