He multiplieth words without knowledge.
Many programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don’t despair, though—Perl isn’t a low-level language; lines and strings are easy to handle.
Perl was designed for text manipulation. In fact, Perl can manipulate text in so many ways that they can’t all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see Chapter 6, and Chapter 8, which discuss interesting techniques not covered here.
Perl’s fundamental unit for working with data is the scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to other values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine’s double-precision floating-point notation. Strings in Perl may be of any length (within the limits of your machine’s virtual memory) and contain any data you care to put there—even binary data containing null bytes.
A string is not an array of bytes: You cannot use array subscripting
on a string to address one of its characters; use
substr
for that. Like all data types in Perl,
strings grow and shrink on demand. They get reclaimed by Perl’s
garbage collection system when they’re no longer used,
typically when the variables holding them go out of scope or when the
expression they were used in has been evaluated. In other words,
memory management is already taken care of for you, so you
don’t have to worry about it.
A scalar value is either defined or undefined. If defined, it may
hold a string, number, or reference. The only undefined value is
undef
. All other values are defined, even
and the empty string. Definedness is not the same as Boolean
truth,
though; to check whether a value is defined, use the
defined
function. Boolean truth has a specialized
meaning, tested with operators like &&
and
||
or in an if
or
while
block’s test condition.
Two defined strings are false: the empty string (""
) and a string of
length one containing the digit zero ("0"
). This
second one may surprise you, but Perl does this because of its
on-demand conversion between strings and numbers. The numbers
0.
, 0.00
, and
0.0000000
are all false when unquoted but are not
false in strings (the string "0.00
" is true, not
false). All other defined values (e.g., "false
“,
15
, and \$x
) are true.
The undef
value behaves like the empty string (""
)
when used as a string, 0
when used as a number,
and the null reference when used as a reference. But in all these
cases, it’s false. Using an undefined value where Perl expects
a defined value will trigger a run-time warning message on STDERR if
you’ve used the -w flag.
Merely asking whether something is true or false does not demand a
particular value, so this is exempt from a warning. Some operations
do not trigger warnings when used on variables holding undefined
values. These include the autoincrement and autodecrement operators,
++
and --
, and the addition and
catenation assignment operators, +=
and
.=
.
Specify
strings in your program either with single quotes, double quotes, the
quote-like operators q//
and
qq//
, or “here documents.” Single
quotes are the simplest form of quoting—the only special
characters are '
to terminate the string,
\'
to quote a single quote in the string,
and \\
to quote a backslash in the string:
$string = '\n'; # two characters, \ and an n $string = 'Jon \'Maddog\' Orwant'; # literal single quotes
Double quotes interpolate variables (but not function calls—see
Section 1.10 to find how to do this) and expand a lot
of backslashed shortcuts: "\n
" becomes a newline,
"\033"
becomes the character with octal value 33,
"\cJ"
becomes a Ctrl-J, and so on. The full list
of these is given in the perlop(1) manpage.
$string = "\n"; # a "newline" character $string = "Jon \"Maddog\" Orwant"; # literal double quotes
The
q//
and qq//
regexp-like
quoting operators let you use alternate delimiters for single- and
double-quoted strings. For instance, if you want a literal string
that contains single quotes, it’s easier to write this than to
escape the single quotes with backslashes:
$string = q/Jon 'Maddog' Orwant/; # literal single quotes
You can use the same character as delimiter, as we do with / here, or you can balance the delimiters if you use parentheses or paren-like characters:
$string = q[Jon 'Maddog' Orwant]; # literal single quotes $string = q{Jon 'Maddog' Orwant}; # literal single quotes $string = q(Jon 'Maddog' Orwant); # literal single quotes $string = q<Jon 'Maddog' Orwant>; # literal single quotes
“Here documents” are borrowed from the shell. They are a way to quote a large chunk of text. The text can be interpreted as single-quoted, double-quoted, or even as commands to be executed, depending on how you quote the terminating identifier. Here we double-quote two lines with a here document:
$a = <<"EOF"; This is a multiline here document terminated by EOF on a line by itself EOF
Note there’s no semicolon after the terminating
EOF
. Here documents are covered in more detail in
Section 1.11.
A warning for non-Western programmers: Perl doesn’t currently directly support multibyte characters (expect Unicode support in 5.006), so we’ll be using the terms byte and character interchangeably.
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.