2.4. Match Any Character
This recipe explains the ins and outs of the dot metacharacter.
Problem
Match a quoted character. Provide one solution that allows any single character, except a line break, between the quotes. Provide another that truly allows any character, including line breaks.
Solution
Any character except line breaks
'.'
Regex options: None (the “dot matches line breaks” option must not be set) |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Any character including line breaks
'.'
Regex options: Dot matches line breaks |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
'[\s\S]'
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Any character except line breaks
The dot is one of the oldest and simplest regular expression features. Its meaning has always been to match any single character.
There is, however, some confusion as to what any character truly means. The oldest tools for working with regular expressions processed files line by line, so there was never an opportunity for the subject text to include a line break. The programming languages discussed in this book process the subject text as a whole, no matter how many line breaks you put into it. If you want true line-by-line processing, you have to write a bit of code that splits the subject into an array of lines and applies the regex to each line in the array. Recipe 3.21 in the next chapter shows how to do this.
Larry Wall, the developer of Perl, wanted Perl to retain ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.