Character Classes
The [...]
construct
is used to list a set of characters (a character
class) of which one will match.
Brackets are often used when capitalization is uncertain in a
match:
/[tT]here/
A dash (-
) may be
used to indicate a range of characters in a character class:
/[a-zA-Z]/; # Match any single letter /[0-9]/; # Match any single digit
To put a literal dash in the list you must use a backslash
before it (\-
).
By placing a ^
as
the first element in the brackets, you create a negated character
class, i.e., it matches any character not in the list. For
example:
/[^A-Z]/; # Matches any character other than an uppercase letter
Some common character classes have their own predefined escape sequences for your programming convenience :
Code | Matches |
---|---|
| A digit, same as |
| A nondigit, same as |
| A word character (alphanumeric), same as
|
| A non-word character, |
| A whitespace character, same as |
| A non-whitespace character, |
| Match a character (byte) |
| Match P-named (Unicode) property |
| Match non-P |
| Match extended unicode sequence |
While Perl implements lc()
and uc( )
, which you can use for
testing the proper case of words or characters, you can do the same
with escape sequences :
Code | Matches |
---|---|
| Lowercase until next character |
| Uppercase until next character |
| Lowercase until |
| Uppercase until |
| Disable pattern metacharacters until |
| End case modification |
These elements match ...
Get Perl in a Nutshell, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.