2.8. Match One of Several Alternatives
Problem
Create a regular expression that when applied repeatedly
to the text Mary, Jane,
and Sue went to Mary's house
will match Mary
, Jane
, Sue
, and then Mary
again. Further match
attempts should fail.
Solution
Mary|Jane|Sue
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The vertical bar, or pipe
symbol, splits the regular expression into multiple
alternatives. ‹Mary|Jane|Sue
› matches Mary
, or Jane
, or Sue
with each match
attempt. Only one name matches each time, but a different name can match
each time.
All regular expression flavors discussed in this book use a regex-directed engine. The engine is simply the software that makes the regular expression work. Regex-directed[3] means that all possible permutations of the regular expression are attempted at each character position in the subject text, before the regex is attempted at the next character position.
When you apply ‹Mary|Jane|Sue
› to Mary, Jane, and Sue went to Mary's house
,
the match Mary
is immediately found at the start of the string.
When you apply the same regex to the remainder of the string—e.g.,
by clicking “Find Next” in your text editor—the regex engine attempts to
match ‹Mary
› at the first
comma in the string. That fails. Then, it attempts to match ‹Jane
› at the same position, which
also fails. Attempting to match ‹Sue
› at the comma fails, too. Only then does the regex engine advance to the next character in the string. Starting at ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.