Controlling Case Sensitivity in String Comparisons

Problem

A string comparison is case sensitive when you don’t want it to be, or vice versa.

Solution

Alter the case sensitivity of the strings.

Discussion

The examples in previous sections were performed without regard to lettercase. But sometimes you need to make sure a string operation is case sensitive that would not otherwise be, or vice versa. This section describes how to do that for ordinary comparisons. Recipe 4.11 covers case sensitivity in pattern-matching operations.

String comparisons in MySQL are not case sensitive by default:

mysql> SELECT name, name = 'lead', name = 'LEAD' FROM metal;
+----------+---------------+---------------+
| name     | name = 'lead' | name = 'LEAD' |
+----------+---------------+---------------+
| copper   |             0 |             0 |
| gold     |             0 |             0 |
| iron     |             0 |             0 |
| lead     |             1 |             1 |
| mercury  |             0 |             0 |
| platinum |             0 |             0 |
| silver   |             0 |             0 |
| tin      |             0 |             0 |
+----------+---------------+---------------+

The lack of case sensitivity also applies to relative ordering comparisons:

mysql> SELECT name, name < 'lead', name < 'LEAD' FROM metal;
+----------+---------------+---------------+
| name     | name < 'lead' | name < 'LEAD' |
+----------+---------------+---------------+
| copper   |             1 |             1 |
| gold     |             1 |             1 |
| iron     |             1 |             1 |
| lead     |             0 |             0 |
| mercury  |             0 |             0 |
| platinum |             0 |             0 |
| silver   |             0 |             0 |
| tin      |             0 |             0 |
+----------+---------------+---------------+

If you’re familiar with the ASCII collating order, you know that lowercase letters have higher ASCII codes than uppercase letters, so the results in the second comparison column of the preceding query may surprise you. Those results reflect that string ordering is done by default without regard for lettercase, so A and a both are considered lexically less than B.

String comparisons are case sensitive only if at least one of the operands is a binary string. To control case sensitivity in string comparisons, use the following techniques:

  • To make a string comparison case sensitive that normally would not be, cast (convert) one of the strings to binary form by using the BINARY keyword. It doesn’t matter which of the strings you make binary. As long as one of them is, the comparison will be case sensitive:

    mysql> SELECT name, name = BINARY 'lead', BINARY name = 'LEAD' FROM metal;
    +----------+----------------------+----------------------+
    | name     | name = BINARY 'lead' | BINARY name = 'LEAD' |
    +----------+----------------------+----------------------+
    | copper   |                    0 |                    0 |
    | gold     |                    0 |                    0 |
    | iron     |                    0 |                    0 |
    | lead     |                    1 |                    0 |
    | mercury  |                    0 |                    0 |
    | platinum |                    0 |                    0 |
    | silver   |                    0 |                    0 |
    | tin      |                    0 |                    0 |
    +----------+----------------------+----------------------+

    BINARY is available as a cast operator as of MySQL 3.23.

  • To make a string comparison not case sensitive that normally would be, convert both strings to the same lettercase using UPPER( ) or LOWER( ):

    mysql> SELECT UPPER('A'), UPPER('b'), UPPER('A') < UPPER('b');
    +------------+------------+-------------------------+
    | UPPER('A') | UPPER('b') | UPPER('A') < UPPER('b') |
    +------------+------------+-------------------------+
    | A          | B          |                       1 |
    +------------+------------+-------------------------+
    mysql> SELECT LOWER('A'), LOWER('b'), LOWER('A') < LOWER('b');
    +------------+------------+-------------------------+
    | LOWER('A') | LOWER('b') | LOWER('A') < LOWER('b') |
    +------------+------------+-------------------------+
    | a          | b          |                       1 |
    +------------+------------+-------------------------+

The same principles can be applied to string comparison functions. For example, STRCMP( ) takes two string arguments and returns -1, 0, or 1, depending on whether the first string is lexically less than, equal to, or greater than the second. Up through MySQL 4.0.0, STRCMP( ) is case sensitive; it always treats its arguments as binary strings, regardless of their actual type:

mysql> SELECT STRCMP('Abc','abc'), STRCMP('abc','abc'), STRCMP('abc','Abc');
+---------------------+---------------------+---------------------+
| STRCMP('Abc','abc') | STRCMP('abc','abc') | STRCMP('abc','Abc') |
+---------------------+---------------------+---------------------+
|                  -1 |                   0 |                   1 |
+---------------------+---------------------+---------------------+

However, as of MySQL 4.0.1, STRCMP( ) is not case sensitive:

mysql> SELECT STRCMP('Abc','abc'), STRCMP('abc','abc'), STRCMP('abc','Abc');
+---------------------+---------------------+---------------------+
| STRCMP('Abc','abc') | STRCMP('abc','abc') | STRCMP('abc','Abc') |
+---------------------+---------------------+---------------------+
|                   0 |                   0 |                   0 |
+---------------------+---------------------+---------------------+

To preserve the pre-4.0.1 behavior, make one of the arguments a binary string:

mysql> SELECT STRCMP(BINARY 'Abc','abc'), STRCMP(BINARY 'abc','Abc');
+----------------------------+----------------------------+
| STRCMP(BINARY 'Abc','abc') | STRCMP(BINARY 'abc','Abc') |
+----------------------------+----------------------------+
|                         -1 |                          1 |
+----------------------------+----------------------------+

By the way, take special note of the fact that zero and nonzero return values from STRCMP( ) indicate equality and inequality. This differs from the = comparison operator, which returns zero and nonzero for inequality and equality.

To avoid surprises in string comparisons, know the general rules that determine whether or not a string is binary:

  • Any literal string, string expression, or string column can be made binary by preceding it with the BINARY keyword. If BINARY is not present, the following rules apply.

  • A string expression is binary if any of its constituent strings is binary, otherwise not. For example, the result returned by this CONCAT( ) expression is binary because its second argument is binary:

    CONCAT('This is a ',BINARY 'binary',' string')
  • A string column is case sensitive or not depending on the column’s type. The CHAR and VARCHAR types are not case sensitive by default, but may be declared as BINARY to make them case sensitive. ENUM, SET, and TEXT columns are not case sensitive. BLOB columns are case sensitive. (See the table in Recipe 4.1.)

In summary, comparisons are case sensitive if they involve a binary literal string or string expression, or a CHAR BINARY, VARCHAR BINARY, or BLOB column. Comparisons are not case sensitive if they involve only non-binary literal strings or string expressions, or CHAR, VARCHAR, ENUM, SET, or TEXT columns.

ENUM and SET columns are not case sensitive. Furthermore, because they are stored internally as numbers, you cannot declare them case sensitive in the table definition by adding the BINARY keyword. However, you can still use the BINARY keyword before ENUM or SET values in comparisons to produce a case sensitive operation.

If you find that you’ve declared a column using a type that is not suitable for the kind of comparisons for which you typically use it, use ALTER TABLE to change the type. Suppose you have a table in which you store news articles:

CREATE TABLE news
(
    id      INT UNSIGNED NOT NULL AUTO_INCREMENT,
    article BLOB NOT NULL,
    PRIMARY KEY (id)
);

Here the article column is declared as a BLOB, which is a case-sensitive type. Should you wish to convert the column so that it is not case sensitive, you can change the type from BLOB to TEXT using either of these ALTER TABLE statements:

ALTER TABLE news MODIFY article TEXT NOT NULL;
ALTER TABLE news CHANGE article article TEXT NOT NULL;

Prior to MySQL 3.22.16, ALTER TABLE ... MODIFY is unavailable, in which case you can use only ALTER TABLE ... CHANGE. See Chapter 8 for more information.

Get MySQL Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.