References are absolutely essential for creating complex data structures. Since the next chapter is devoted solely to this topic, we will not say more here. This section lists the other advantages of Perl’s support for indirection and memory management.
When you pass more than one
array or hash to a subroutine, Perl merges all of them into the
@_
array available within the subroutine. The only
way to avoid this merger is to pass references to the input arrays or
hashes. Here’s an example that adds elements of one array to
the corresponding elements of the other:
@array1 = (1, 2, 3); @array2 = (4, 5, 6, 7); AddArrays (\@array1, \@array2); # Passing the arrays by reference. print "@array1 \n"; sub AddArrays { my ($rarray1, $rarray2) = @_; $len2 = @$rarray2; # Length of array2 for ($i = 0 ; $i < $len2 ; $i++) { $rarray1->[$i] += $rarray2->[$i]; } }
In this example, two array references are passed to
AddArrays
which then dereferences the two
references, determines the lengths of the arrays, and adds up the
individual array elements.
Using references, you can efficiently pass large amounts of data to and from a subroutine.
However, passing references to scalars typically turns out not to be an optimization at all. I have often seen code like this, in which the programmer has intended to minimize copying while reading lines from a file:
while ($ref_line = GetNextLine()) { ..... ..... } sub GetNextLine () { my $line = <F> ; exit(0) unless defined($line); ..... return \$line; # Return by reference, to avoid copying }
GetNextLine
returns the line by reference to avoid
copying.
You might be surprised how little an effect this strategy has on the
overall performance, because most of the time is taken by reading the
file and subsequently working on $line
. Meanwhile,
the user of GetNextLine
is forced to deal with
indirections ($$ref_line
) instead of the more
straightforward buffer $line
.[11]
Incidentally, you can use the standard library module called Benchmark to time and compare different code implementations, like this:
use Benchmark; timethis (100, "GetNextLine()"); # GetNextLine 100 times, and # time it
The module defines a subroutine called timethis
that takes a piece of code, runs it as many times as you tell it to,
and prints out the elapsed time. We’ll cover the
use
statement in Chapter 6.
So far, we have created references to previously existing variables. Now we will learn to create references to “anonymous” data structures—that is, values that are not associated with a variable.
To create an anonymous array, use square brackets instead of parentheses:
$ra = [ ]; # Creates an empty, anonymous array # and returns a reference to it $ra = [1,"hello"]; # Creates an initialized anonymous array # and returns a reference to it
This notation not only allocates anonymous storage, it also returns a
reference to it, much as malloc(3)
returns a
pointer in C.
What happens if you use
parentheses instead of square brackets?
Recall again that Perl evaluates the right side as a comma-separated
expression and returns the value of the last element;
$ra
contains the value “hello”, which
is likely not what you are looking for.
To create an anonymous hash, use braces instead of square brackets:
$rh = { }; # Creates an empty hash and returns a # reference to it $rh = {"k1", "v1", "k2", "v2"}; # A populated anonymous hash
Both these notations are easy to remember since they represent the bracketing characters used by the two datatypes—brackets for arrays and braces for hashes. Contrast this to the way you’d normally create a named hash:
# An ordinary hash uses the prefix and is initialized with a list # within parentheses %hash = ("flock" => "birds", "pride" => "lions"); # An anonymous hash is a list contained within curly braces. # The result of the expression is a scalar reference to that hash. $rhash = {"flock" => "birds", "pride" => "lions"};
What about dynamically allocated scalars ? It turns out that Perl doesn’t have any notation for doing something like this, presumably because you almost never need it. If you really do, you can use the following trick: Create a reference to an existing variable, and then let the variable pass out of scope.
{ my $a = "hello world"; # 1 $ra = \$a; # 2 } print "$$ra \n"; # 3
The my
operator
tags a variable as private (or localizes it, in
Perl-speak). You can use the local
operator
instead, but there is a subtle yet very important difference between
the two that we will clarify in Chapter 3. For
this example, both work equally well.
Now, $ra
is a global variable that refers to the
local variable $a
(not the keyword
local
). Normally, $a
would be
deleted at the end of the block, but since $ra
continues to refer to it, the memory allocated for
$a
is not thrown away. Of course, if you reassign
$ra
to some other value, this space is deallocated
before $ra
is prepared to accept the new value.
You can create references to constant scalars like this:
$r = \10; $rs = \"hello";
Constants are statically allocated and anonymous.
A reference variable does not care to know or remember whether it points to an anonymous value or to an existing variable’s value. This is identical to the way pointers behave in C.
We have seen how a reference refers to some other entity, including other references (which are just ordinary scalars). This means that we can have multiple levels of references, like this:
$a = 10; $ra = \$a; # reference to $a's value. $rra = \$ra; # reference to a reference to $a's value $rrra = \$rra; # reference to a reference to a reference ...
Now we’ll dereference these. The following statements all yield
the same value (that of $a
):
print $a; # prints 10. The following statements print the same. print $$ra; # $a seen from one level of indirection. print $$$rra; # replacera
with{$rra}
: still referring # to $a's value print $$$$rrra; # ... and so on.
Incidentally, this example illustrates a convention known to Microsoft Windows programmers as " Hungarian notation.”[12] Each variable name is prefixed by its type (“r” for reference, “rh” for reference to a hash, “i” for integer, “d” for double, and so on). Something like the following would immediately trigger some suspicion:
$$rh_collections[0] = 10; # RED FLAG : 'rh' being used as an array?
You have a variable called $rh_collections
, which
is presumably a reference to a hash because of its naming convention
(the prefix rh
), but you are using it instead as a
reference to an array. Sure, Perl will alert you to this by raising a
run-time exception (“Not an ARRAY reference at—line
2.”). But it is easier to check the code while you are writing
it than to painstakingly exercise all the code paths during the
testing phase to rule out the possibility of run-time
errors.
Earlier, while discussing precedence, we
showed that $$rarray[1]
is actually the same as
${$rarray}[1]
. It wasn’t entirely by
accident that we chose braces to denote the grouping. It so happens
that there is a more general rule.
The braces signify a block of code, and Perl doesn’t care what
you put in there as long as it yields a reference of the required
type. Something like {$rarray}
is a
straightforward expression that yields a reference readily. By
contrast, the following example calls a subroutine within the block,
which in turn returns a reference:
sub test { return \$a; # returns a reference to a scalar variable } $a = 10; $b = ${test()}; # Calls a subroutine test within the block, which # yields a reference to $a # This reference is dereferenced print $b; # prints "10"
To summarize, a block that yields a reference can occur wherever the
name of a variable can occur. Instead of $a
, you
can have ${$ra}
or ${$array[1]}
(assuming $array[1]
has a reference to
$a
), for example.
Recall that a block can have any number of statements inside it, and the last expression evaluated inside that block represents its result value. Unless you want to be a serious contender for the Obfuscated Perl contest, avoid using blocks containing more than two expressions while using the general dereferencing rule stated above.
While we are talking about obfuscation, it
is worth talking about a very insidious way of including executable
code within strings. Normally, when Perl sees a string such as
"$a
“, it does variable interpolation.
But you now know that "a
" can be
replaced by a block as long as it returns a reference to a scalar, so
something like this is completely acceptable, even within a string:
print "${foo()}";
Replace foo()
by system (
'/bin/rm *
')
and you
have an unpleasant Trojan horse:
print "${system('/bin/rm *')}"
Perl treats it like any other function and trusts
system
to return a reference to a scalar. The
parameters given to system
do their damage before
Perl has a chance to figure out that system
doesn’t return a scalar reference.
Moral of the story: Be very careful of strings that you get from
untrusted sources. Use the
taint-mode option (invoke Perl as
perl
-T
) or the Safe module
that comes with the Perl distribution. Please see the Perl
documentation for taint checking, and see the index for some pointers
to the Safe
module.
[11] The operative word here is “typically.” Most applications deal with lines 60-70 bytes long.
[12] After Charles Simonyi who started this convention at Microsoft. This convention is a topic of raging debates on the Internet; people either love it or hate it. Apparently, even at Microsoft, the systems folks use it, while the application folks don’t. In a language without enforced type checking such as Perl, I recommend using it where convenient.
Get Advanced Perl Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.