2.8. Match One of Several Alternatives
Problem
Create a regular expression that when applied repeatedly to the text Mary, Jane, and Sue went to Mary's
house
will match Mary
, Jane
, Sue
, and then Mary
again. Further match attempts should
fail.
Solution
Mary|Jane|Sue
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The vertical bar, or pipe
symbol, splits the regular expression into multiple
alternatives. ‹Mary|Jane|Sue
› matches Mary
, or Jane
, or Sue
with each match
attempt. Only one name matches each time, but a different name can
match each time.
All regular expression flavors discussed in this book use a regex-directed engine. The engine is simply the software that makes the regular expression work. Regex-directed[2] means that all possible permutations of the regular expression are attempted at each character position in the subject text, before the regex is attempted at the next character position.
When you apply ‹Mary|Jane|Sue
› to Mary, Jane, and Sue went to Mary's
house
, the match Mary
is immediately found at the start of
the string.
When you apply the same regex to the remainder of the
string—e.g., by clicking “Find Next” in your text editor—the regex
engine attempts to match ‹Mary
› at the first comma in the string. That
fails. Then, it attempts to match ‹Jane
› at the same position, which also fails.
Attempting to match ‹Sue
› at the comma fails, too. Only then does the regex engine advance to the next character in the string. Starting at ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.