Escaping Characters

Problem

You need to output a string with certain characters (quotes, commas, etc.) escaped. For instance, you’re producing a format string for sprintf and want to convert literal % signs into %%.

Solution

Use a substitution to backslash or double each character to be escaped.

# backslash
$var =~ s/([CHARLIST])/\\$1/g;

# double
$var =~ s/([CHARLIST])/$1$1/g;

Discussion

$var is the variable to be altered. The CHARLIST is a list of characters to escape and can contain backslash escapes like \t and \n. If you just have one character to escape, omit the brackets:

$string =~ s/%/%%/g;

The following lets you do escaping when preparing strings to submit to the shell. (In practice, you would need to escape more than just ' and " to make any arbitrary string safe for the shell. Getting the list of characters right is so hard, and the risks if you get it wrong are so great, that you’re better off using the list form of system and exec to run programs, shown in Section 16.2. They avoid the shell altogether.)

$string = q(Mom said, "Don't do that.");
$string =~ s/(['"])/\\$1/g;

We had to use two backslashes in the replacement because the replacement section of a substitution is read as a double-quoted string, and to get one backslash, you need to write two. Here’s a similar example for VMS DCL, where you need to double every quote to get one through:

$string = q(Mom said, "Don't do that.");
$string =~ s/(['"])/$1$1/g;

Microsoft command interpreters are harder to work with. In DOS and Windows COMMAND.COM recognizes double quotes but not single ones, has no clue what to do with backquotes, and requires a backslash to make a double quote a literal. Almost any of the free or commercial Unix-like shell environments for Windows will improve this depressing situation.

Because we’re using character classes in the regular expressions, we can use - to define a range, and ^ at the start to negate. This escapes all characters that aren’t in the range A through Z.

$string =~ s/([^A-Z])/\\$1/g;

If you want to escape all non-word characters, use the \Q and \E string metacharacters or the quotemeta function. For example, these are equivalent:

$string = "this \Qis a test!\E";
$string = "this is\\ a\\ test\\!";
$string = "this " . quotemeta("is a test!");

See Also

The s/// operator in perlre(1) and perlop(1) and the “Pattern Matching” section of Chapter 2 of Programming Perl; the quotemeta function in perlfunc(1) and Chapter 3 of Programming Perl; the discussion of HTML escaping in Section 19.1; Section 19.6 for how to avoid having to escape strings to give the shell

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.