Use the REGEXP
operator and a regular expression
pattern, described in this section. Or use a SQL pattern, described
in Recipe 4.7.
SQL patterns (see Recipe 4.7) are likely to be
implemented by other database systems, so they’re
reasonably portable beyond MySQL. On the other hand,
they’re somewhat limited. For example, you can
easily write a SQL pattern %abc%
to find strings
that contain abc
, but you cannot write a single
SQL pattern to identify strings that contain any of the characters
a
, b
, or c
.
Nor can you match string content based on character types such as
letters or digits. For such operations, MySQL supports another type
of pattern matching operation based on regular expressions and the
REGEXP
operator (or
NOT
REGEXP
to reverse the
sense of the match).[24]
REGEXP
matching uses a different set of pattern elements than
%
and _
(neither of which is
special in regular expressions):
Pattern |
What the pattern matches |
---|---|
Beginning of string | |
End of string | |
Any single character | |
Any character listed between the square brackets | |
|
Any character not listed between the square brackets |
Alternation; matches any of the patterns
| |
Zero or more instances of preceding element | |
One or more instances of preceding element | |
| |
|
You may already be familiar with these regular expression pattern characters, because many of them are the same as those used by vi, grep, sed, and other Unix utilities that support regular expressions. Most of them are used also in the regular expressions understood by Perl, PHP, and Python. (For example, Chapter 10 discuss pattern matching in Perl scripts.) For Java, the Jakarta ORO or Regexp class libraries provide matching capabilities that use these characters as well.
The previous section on SQL patterns showed how to match substrings at the beginning or end of a string, or at an arbitrary or specific position within a string. You can do the same things with regular expressions:
Strings that begin with a particular substring:
mysql>
SELECT name FROM metal WHERE name REGEXP '^co';
+--------+ | name | +--------+ | copper | +--------+Strings that end with a particular substring:
mysql>
SELECT name FROM metal WHERE name REGEXP 'er$';
+--------+ | name | +--------+ | copper | | silver | +--------+Strings that contain a particular substring at any position:
mysql>
SELECT name FROM metal WHERE name REGEXP 'er';
+---------+ | name | +---------+ | copper | | mercury | | silver | +---------+Strings that contain a particular substring at a specific position:
mysql>
SELECT name FROM metal WHERE name REGEXP '^..pp';
+--------+ | name | +--------+ | copper | +--------+
In addition, regular expressions have other capabilities and can perform kinds of matches that SQL patterns cannot. For example, regular expressions can contain character classes, which match any character in the class:
To write a character class, list the characters you want the class to match inside square brackets. Thus, the pattern
[abc]
matches eithera
,b
, orc
.Classes may indicate ranges of characters by using a dash between the beginning and end of the range.
[a-z]
matches any letter,[0-9]
matches digits, and[a-z0-9]
matches letters or digits.To negate a character class (“match any character but these”), begin the list with a
^
character. For example,[^0-9]
matches anything but digits.
MySQL’s regular expression capabilities also support POSIX character classes. These match specific character sets, as described in the following table.
POSIX class |
What the class matches |
---|---|
|
Alphabetic and numeric characters |
|
Alphabetic characters |
|
Whitespace (space or tab characters) |
|
Control characters |
|
Digits |
|
Graphic (non-blank) characters |
|
Lowercase alphabetic characters |
|
Graphic or space characters |
|
Punctuation characters |
|
Space, tab, newline, carriage return |
|
Uppercase alphabetic characters |
|
Hexadecimal digits ( |
POSIX classes are intended for use within character classes, so you use them within square brackets. The following expression matches values that contain any hexadecimal digit character:
mysql> SELECT name, name REGEXP '[[:xdigit:]]' FROM metal;
+----------+----------------------------+
| name | name REGEXP '[[:xdigit:]]' |
+----------+----------------------------+
| copper | 1 |
| gold | 1 |
| iron | 0 |
| lead | 1 |
| mercury | 1 |
| platinum | 1 |
| silver | 1 |
| tin | 0 |
+----------+----------------------------+
Regular expressions can contain alternations. The syntax looks like this:
alternative1
|alternative2
|...
An alternation
is similar to a character class in the sense that it matches if any
of the alternatives match. But unlike a character class, the
alternatives are not limited to single characters—they can be
strings or even patterns. For example, the following alternation
matches strings that begin with a vowel or end with
er
:
mysql> SELECT name FROM metal WHERE name REGEXP '^[aeiou]|er$';
+--------+
| name |
+--------+
| copper |
| iron |
| silver |
+--------+
Parentheses may be used to group alternations. For example, if you want to match strings that consist entirely of digits or entirely of letters, you might try this pattern, using an alternation:
mysql> SELECT '0m' REGEXP '^[[:digit:]]+|[[:alpha:]]+$';
+-------------------------------------------+
| '0m' REGEXP '^[[:digit:]]+|[[:alpha:]]+$' |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
But as the query result shows, the pattern doesn’t
work. That’s because the ^
groups
with the first alternative, and the $
groups with
the second alternative. So the pattern actually matches strings that
begin with one or more digits, or strings that end with one or more
letters. However, if you group the alternatives within parentheses,
the ^
and $
will apply to both
of them and the pattern will act as you expect:
mysql> SELECT '0m' REGEXP '^([[:digit:]]+|[[:alpha:]]+)$';
+---------------------------------------------+
| '0m' REGEXP '^([[:digit:]]+|[[:alpha:]]+)$' |
+---------------------------------------------+
| 0 |
+---------------------------------------------+
Unlike SQL
pattern matches, which are successful only if the pattern matches the
entire comparison value, regular expressions are successful if the
pattern matches anywhere within the value. The following two pattern
matches are equivalent in the sense that each one succeeds only for
strings that contain a b
character, but the first
is more efficient because the pattern is simpler:
'abc' REGEXP 'b' 'abc' REGEXP '^.*b.*$'
Regular expressions do not match NULL
values. This
is true both for REGEXP
and for
NOT
REGEXP
:
mysql> SELECT NULL REGEXP '.*', NULL NOT REGEXP '.*';
+------------------+----------------------+
| NULL REGEXP '.*' | NULL NOT REGEXP '.*' |
+------------------+----------------------+
| NULL | NULL |
+------------------+----------------------+
The fact that a regular expression matches a string if the pattern is
found anywhere in the string means you must take care not to
inadvertently specify a pattern that matches the empty string. If you
do, it will match any non-NULL
value at all. For
example, the pattern a*
matches any number of
a
characters, even none. If your goal is to match
only strings containing nonempty sequences of a
characters, use a+
instead. The
+
requires one or more instances of the preceding
pattern element for a match.
As with SQL pattern matches performed using LIKE
,
regular expression matches performed with REGEXP
sometimes are equivalent to substring comparisons. The
^
and $
metacharacters
serve much the same purpose as LEFT( )
or
RIGHT( )
, at least if you’re
looking for literal strings:
Pattern match |
Substring comparison |
---|---|
|
|
|
|
For non-literal strings, it’s typically not possible to construct an equivalent substring comparison. For example, to match strings that begin with any nonempty sequence of digits, you can use this pattern match:
str REGEXP '^[0-9]+'
That is something that LEFT( )
cannot do (and
neither can LIKE
, for that matter).
[24] RLIKE
is a
synomym for REGEXP
. This is for mSQL (miniSQL)
compatibility and may make it easier to port queries from mSQL to
MySQL.
Get MySQL Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.