References can be copied and passed around like any other scalar. At any given time, Perl knows the number of references to a particular data item. Perl can also create references to anonymous data structures (structures that do not have explicit names) and create references automatically as needed to fulfill certain kinds of operations. Let’s look at copying references and how it affects scoping and memory usage.
Chapter 3 explored how to take a reference to an
array @skipper
and place it into a new scalar
variable:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my $reference_to_skipper = \@skipper;
You can then copy the reference or take additional references, and they’d all refer to the same thing and be interchangeable:
my $second_reference_to_skipper = $reference_to_skipper; my $third_reference_to_skipper = \@skipper;
At this point, you have four different ways to access the data
contained in @skipper
:
@skipper @$reference_to_skipper @$second_reference_to_skipper @$third_reference_to_skipper
Perl tracks how many ways the data can be accessed through a mechanism called reference counting. The original name counts as one, and each additional reference that was taken (including copies of references) also counts as one. The total number of references to the array of provisions is now four.
You can add and remove references as you wish, and as long as the reference count doesn’t hit zero, the array is maintained in memory and is still accessible via any of the other access paths. For example, you might have a temporary reference:
check_provisions_list(\@skipper)
When this subroutine begins executing,
a fifth reference to the data is created and copied into
@_
for the subroutine. The subroutine is free to
create additional copies of that reference, which Perl notes as
needed. Typically, when the subroutine returns, all such references
are discarded automatically, and you’re back to four
references again.
You can kill off each reference by using the variable for something
other than a reference to the value of @skipper
.
For example, you can assign undef
to the variable:
$reference_to_skipper = undef;
Or, maybe just let the variable go out of scope:
my @skipper = ...; { ... my $ref = \@skipper; ... ... } # $ref goes out of scope at this point
In particular, a reference held in a subroutine’s private (lexical) variable goes away at the end of the subroutine.
Whether the value is changed or the variable itself goes away, Perl notes it as an appropriate reduction in the number of references to the data.
Perl recycles the memory for the array only when all references
(including the name of the array) go away. In this case, memory is
reclaimed when @skipper
goes out of scope, as well
as all other references that had been taken to
@skipper
are removed or modified to be another
value. Such memory is available to Perl for other data later in this
program invocation but generally will not be returned to the
operating system for use by other processes.
Typically, all references to a variable are removed before the variable itself. But what if one of the references outlives the variable name? For example, consider this code:
my $ref; { my @skipper = qw(blue_shirt hat jacket preserver sunscreen); $ref = \@skipper; print "$ref->[2]\n"; # prints jacket\n } print "$ref->[2]\n"; # still prints jacket\n
Immediately after the @skipper
array is declared,
you have one reference to the five-element list. After
$ref
is initialized, you’ll have
two, down to the end of the block. When the block ends, the
@skipper
name disappears. However, this was only
one of the two ways to access the data! Thus, the five-element list
is not removed from memory, and $ref
is still
pointing to that data.
At this point, the five-element list is contained within an anonymous array, which is a fancy term for an array without a name.
Until the value of $ref
is changed, or $ref
itself disappears, you can
still continue to use all the dereferencing strategies you used prior
to when the name of the array disappeared. In fact,
it’s still a fully functional array that you can
shrink or grow just as you do any other Perl array:
push @$ref, "sextant"; # add a new provision print "$ref->[-1]\n"; # prints sextant\n
You can even increase the reference count at this point:
my $copy_of_ref = $ref;
or equivalently:
my $copy_of_ref = \@$ref;
The data remains alive until the last reference is destroyed:
$ref = undef; # not yet... $copy_of_ref = undef; # poof!
The data remains alive until the last reference is destroyed, even if that reference is contained within a larger active data structure. Suppose an array element is itself a reference. Recall the example from Chapter 3:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my @skipper_with_name = ("The Skipper", \@skipper); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); my @professor_with_name = ("The Professor", \@professor); my @gilligan = qw(red_shirt hat lucky_socks water_bottle); my @gilligan_with_name = ("Gilligan", \@gilligan); my @all_with_names = ( \@skipper_with_name, \@professor_with_name, \@gilligan_with_name, );
Imagine for a moment that the intermediate variables are all part of a subroutine:
my @all_with_names; sub initialize_provisions_list { my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my @skipper_with_name = ("The Skipper", \@skipper); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); my @professor_with_name = ("The Professor", \@professor); my @gilligan = qw(red_shirt hat lucky_socks water_bottle); my @gilligan_with_name = ("Gilligan", \@gilligan); @all_with_names = ( # set global \@skipper_with_name, \@professor_with_name, \@gilligan_with_name, ); } initialize_provisions_list( );
The value of @all_with_names
is set to contain
three references. Inside the subroutine are named arrays with
references to arrays first placed into other named arrays.
Eventually, the values end up in the global
@all_with_names
. However, as the subroutine
returns, the names for the six arrays disappear. Each array has had
one other reference taken to it, making the reference count
temporarily two, and then back to one as the name is removed. Because
the reference count is not yet zero, the data continues to live on,
although it is now referenced only by elements of
@all_with_names
.
Rather than assign the global variable, you can rewrite this as:
sub get_provisions_list { my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my @skipper_with_name = ("The Skipper", \@skipper); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); my @professor_with_name = ("The Professor", \@professor); my @gilligan = qw(red_shirt hat lucky_socks water_bottle); my @gilligan_with_name = ("Gilligan", \@gilligan); return ( \@skipper_with_name, \@professor_with_name, \@gilligan_with_name, ); } my @all_with_names = get_provisions_list( );
Here, you create the value that will eventually be stored into
@all_with_names
as the last expression evaluated
in the subroutine. A three-element list is returned and assigned. As
long as the named arrays within the subroutine have had at least one
reference taken of them, and it is still part of the return value,
the data remains alive.[18]
If the references in @all_with_names
are altered
or discarded, the reference count for the corresponding arrays is
reduced. If that means the reference count has become zero (as in
this example), those arrays themselves are also eliminated. Because
these arrays also contain a reference (such as the reference to
@skipper
), that reference is also reduced by one.
Again, that reduces the reference count to zero, freeing that memory
as well, in a cascading effect.
Removing the top of a tree of data generally removes all the data contained within. The exception is when additional copies are made of the references of the nested data. For example, if you copied Gilligan’s provisions:
my $gilligan_stuff = $all_with_names[2][1];
then when you remove @all_with_names
, you still
have one live reference to what was formerly
@gilligan
, and the data from there downward
remains alive.
The bottom line is simply: Perl does the right thing. If you still have a reference to data, you still have the data.
Reference-counting as a way to manage memory has been around for a long time. The downside of reference counting is that it breaks when the data structure is not a directed graph, in which some parts of the structure point back in to other parts in a looping way. For example, suppose each of two data structures contains a reference to the other (see Figure 4-1):
my @data1 = qw(one won); my @data2 = qw(two too to); push @data2, \@data1; push @data1, \@data2;
Figure 4-1. When the references in a data structure form a loop, Perl’s reference-counting system may not be able to recognize and recycle the no-longer-needed memory space
At this point, there are two names for the data in
@data1
: @data1
itself and
@{$data2[3]}
, and two names for the data in
@data2
: @data2
itself and
@{$data1[2]}
. You’ve created a
loop. In fact, you can access won
with an infinite
number of names, such as
$data1[2][3][2][3][2][3][1]
.
What happens when these two array names go out of scope? Well, the reference count for the two arrays goes down from two to one. But not zero! And because it’s not zero, Perl thinks there might still be a way to get to the data, even though there isn’t! Thus, you’ve created a memory leak. Ugh. (A memory leak in a program causes the program to consume more and more memory over time.)
At this point, you’re right to think that example is contrived. Of course you would never make a looped data structure in a real program! Actually, programmers often make these loops as part of doubly-linked lists, linked rings, or a number of other data structures. The key is that Perl programmers rarely do so because the most important reasons to use those data structures don’t apply in Perl. If you’ve used other languages, you may have noticed programming tasks that are comparatively easy in Perl. For example, it’s easy to sort a list of items or to add or remove items, even in the middle of the list. Those tasks are difficult in some other languages, and using a looped data structure is a common way to get around the language’s limitations.
Why mention it here? Well, even Perl programmers sometimes copy an algorithm from another programming language. There’s nothing inherently wrong with doing this, although it would be better to decide why the original author used a “loopy” data structure and recode the algorithm to use Perl’s strengths. Perhaps a hash should be used instead, or perhaps the data should go into an array that will be sorted later.
A future version of Perl is likely to use garbage collection in addition to or instead of referencing counting. Until then, you must be careful to not create circular references, or if you do, break the circle before the variables go out of scope. For example, the following code doesn’t leak:
{ my @data1 = qw(one won); my @data2 = qw(two too to); push @data2, \@data1; push @data1, \@data2; ... use @data1, @data2 ... # at the end: @data1 = ( ); @data2 = ( ); }
You have eliminated the reference to @data2
from
within @data1
, and vice versa. Now the data has
only one reference each, which are returned to zero references at the
end of the block. In fact, you can clear out either one and not the
other, and it still works nicely. Chapter 10 shows
how to create weak references, which can help with many of these
problems.
In the
get_provisions_list
routine earlier, you created a
half dozen array names that were used only so that you could take a
reference to them immediately afterward. When the subroutine exited,
the array names all went away, but the references remained.
While creating temporarily named arrays would work in the simplest cases, creating such names becomes more complicated as the data structures become more detailed. You’d have to keep thinking of names of arrays just so you can forget them shortly thereafter.
You can reduce the namespace clutter by narrowing down the scope of the various array names. Rather than letting them be declared within the scope of the subroutine, you can create a temporary block:
my @skipper_with_name; { my @skipper = qw(blue_shirt hat jacket preserver sunscreen); @skipper_with_name = ("The Skipper", \@skipper); }
At this point, the second element of
@skipper_with_name
is a reference to the array
formerly known as @skipper
. However, the name is
no longer relevant.
This is a lot of typing to simply say “the second element should be a reference to an array containing these elements.” You can create such a value directly using the anonymous array constructor, which is Yet Another Use for square brackets:
my $ref_to_skipper_provisions = [ qw(blue_shirt hat jacket preserver sunscreen) ];
The square brackets take the value within (evaluated in a list context); establish a new, anonymous array initialized to those values; and (here’s the important part) return a reference to that array. It’s as if you said:
my $ref_to_skipper_provisions; { my @temporary_name = ( qw(blue_shirt hat jacket preserver sunscreen) ); $ref_to_skipper_provisions = \@temporary_name; }
Here you don’t need to come up with a temporary name, and you don’t need the extra noise of the temporary block. The result of a square-bracketed anonymous array constructor is an array reference, which fits wherever a scalar variable fits.
Now you can use it to construct the larger list:
my $ref_to_skipper_provisions = [ qw(blue_shirt hat jacket preserver sunscreen) ]; my @skipper_with_name = ("The Skipper", $ref_to_skipper_provisions);
Of course, you didn’t actually need that scalar temporary, either. You can put a scalar reference to an array as part of a larger list:
my @skipper_with_name = ( "The Skipper", [ qw(blue_shirt hat jacket preserver sunscreen) ] );
Now let’s walk through this. You’ve
declared @skipper_with_name
, the first element of
which is the Skipper’s name string, and the second
element is an array reference, obtained by placing the five
provisions into an array and taking a reference to it. So
@skipper_with_name
is only two elements long, just
as before.
Don’t confuse the
square brackets with the parentheses here. They each have their
distinct purpose. If you replace the square brackets with
parentheses, you end up with a six-element list. If you replace the
outer parentheses (on the first and last lines) with square brackets,
you construct an anonymous array that’s two elements
long and then take the reference to that array as the only element of
the ultimate @skipper_with_name
array.[19]
So, in summary, the syntax:
my $fruits; { my @secret_variable = ('pineapple', 'papaya', 'mango'); $fruits = \@secret_variable; }
can be simply replaced with:
my $fruits = ['pineapple', 'papaya', 'mango'];
Does this work for more complicated structures? Yes! Any time you need an element of a list to be a reference to an array, you can create that reference with an anonymous array constructor. In fact, you can also nest them in your provisions list:
sub get_provisions_list { return ( ["The Skipper", [qw(blue_shirt hat jacket preserver sunscreen)] ], ["The Professor", [qw(sunscreen water_bottle slide_rule batteries radio)] ], ["Gilligan", [qw(red_shirt hat lucky_socks water_bottle)] ], ); } my @all_with_names = get_provisions_list( );
Walking through this from the outside in, you have a return value of three elements. Each element is an array reference, pointing to an anonymous two-element array. The first element of each array is a name string, while the second element is a reference to an anonymous array of varying lengths naming the provisions—all without having to come up with temporary names for any of the intermediate layers.
To the caller of this subroutine, the return value is identical to the previous version. However, from a maintenance point of view, the reduced clutter of not having all the intermediate names saves screen and brain space.
You can show a reference to an empty anonymous hash using an empty anonymous array constructor. For example, if you add one “Mrs. Howell” to that fictional travel list, as someone who has packed rather light, you’d simply insert:
["Mrs. Howell", [ ] ],
This is a single element of the larger list. This item is a reference to an array with two elements, the first of which is the name string, and the second of which is itself a reference to an empty anonymous array. The array is empty because Mrs. Howell hasn’t packed anything for this trip.
Similar to creating an anonymous array, you can also create an anonymous hash. Consider the crew roster from Chapter 3:
my %gilligan_info = ( name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', ); my %skipper_info = ( name => 'Skipper', hat => 'Black', shirt => 'Blue', position => 'Captain', ); my @crew = (\%gilligan_info, \%skipper_info);
The variables
%gilligan_info
and
%skipper_info
are just temporaries, needed to
create the hashes for the final data structure. You can construct the
reference directly with the anonymous hash
constructor, which is Yet Another Meaning for curly
braces, as you’ll see. Replace this:
my $ref_to_gilligan_info; { my %gilligan_info = ( name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', ); $ref_to_gilligan_info = \%gilligan_info; }
with the anonymous hash constructor:
my $ref_to_gilligan_info = { name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', };
The value between the open and closing curly braces is an eight-element list. The eight-element list becomes a four-element anonymous hash (four key-value pairs). A reference to this hash is taken and returned as a single scalar value, which is placed into the scalar variable. Thus, you cam rewrite the roster creation as:
my $ref_to_gilligan_info = { name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', }; my $ref_to_skipper_info = { name => 'Skipper', hat => 'Black', shirt => 'Blue', position => 'Captain', }; my @crew = ($ref_to_gilligan_info, $ref_to_skipper_info);
As before, you can now avoid the temporary variables and insert the values directly into the top-level list:
my @crew = ( { name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', }, { name => 'Skipper', hat => 'Black', shirt => 'Blue', position => 'Captain', }, );
Note the use of trailing commas on the lists when the element is not immediately next to the closing brace, bracket, or parenthesis. This is a nice style element to adopt because it allows for easy maintenance. Lines can be added quickly, rearranged, or commented out without destroying the integrity of the list.
Now @crew
is identical to the value it had before,
but you no longer need to invent names for the intermediate data
structures. As before, the @crew
variable contains
two elements, each of which is a reference to a hash containing
keyword-based information about a particular crew member.
The anonymous hash constructor always evaluates its contents in a list context and then constructs a hash from key/value pairs, just as if you had assigned that list to a named hash. A reference to that hash is returned as a single value that fits wherever a scalar fits.
Now, a word from our parser: because
blocks and anonymous hash constructors both use curly braces in
roughly the same places in the syntax tree, the compiler has to make
ad hoc determinations about which of the two you mean. If the
compiler ever decides incorrectly, you might need to provide a hint
to get what you want. To show the compiler that you want an anonymous
hash constructor, put a plus sign before the opening curly brace:
+{ ... }
. To be sure to get a block of code, just
put a semicolon (representing an empty statement) at the beginning of
the block: {; ... }
.
Let’s look again at the provisions list. Suppose you were reading the data from a file, in the format:
The Skipper blue_shirt hat jacket preserver sunscreen Professor sunscreen water_bottle slide_rule Gilligan red_shirt hat lucky_socks water_bottle
Provisions are indented with some whitespace, following a nonindented line with the person’s name. Let’s construct a hash of provisions. The keys of the hash will be the person’s name, and the value will be an array reference to an array containing a list of provisions.
Initially, you might gather the data using a simple loop:
my %provisions; my $person; while (<>) { if (/^(\S.*)/) { # a person's name (no leading whitespace) $person = $1; $provisions{$person} = [ ] unless exists $provisions{$person}; } elsif (/^\s+(\S.*)/) { # a provision die "No person yet!" unless defined $person; push @{ $provisions{$person} }, $1; } else { die "I don't understand: $_"; } }
First, you declare the variables for the resulting hash of provisions
and the current person. For each line that is read, determine if
it’s a person or a provision. If
it’s a person, remember the name and create the hash
element for that person. The unless exists
test
ensures that you won’t delete
someone’s provision list if his list is split in two
places in the data file. For example, suppose that
“The Skipper” and
" sextant” (note the leading
whitespace) are at the end of the data file in order to list an
additional data item.
The key is the person’s name, and the value is initially a reference to an empty anonymous array. If the line is a provision, push it to the end of the correct array, using the array reference.
This code works fine, but it actually says more than it needs to. Why? Because you can leave out the line that initializes the hash element’s value to a reference to an empty array:
my %provisions; my $person; while (<>) { if (/^(\S.*)/) { # a person's name (no leading whitespace) $person = $1; ## $provisions{$person} = [ ] unless exists $provisions{$person}; } elsif (/^\s+(\S.*)/) { # a provision die "No person yet!" unless defined $person; push @{ $provisions{$person} }, $1; } else { die "I don't understand: $_"; } }
What happens when you try to store that blue shirt for the Skipper? While looking at the second line of input, you’ll end up with this effect:
push @{ $provisions{"The Skipper"} }, "blue_shirt";
At this point, $provisions{"The Skipper"}
doesn’t exist, but you’re trying to
use it as an array reference. To resolve the situation, Perl
automatically inserts a reference to a new empty anonymous array into
the variable and continues the operation. In this case, the reference
to the newly created empty array is dereferenced, and you push the
blue shirt to the provisions list.
This process is called autovivification. Any
nonexisting variable, or a variable containing
undef
, which is dereferenced while looking for a
variable location (technically called an lvalue
context), is automatically stuffed with the appropriate
reference to an empty item, and the operation is allowed to proceed.
This is actually the same behavior
you’ve probably been using in Perl all along. Perl
creates new variables as needed. Before that statement,
$provisions{"The
Skipper"}
didn’t exist, so Perl created it. Then @{ $provisions{"The Skipper"} }
didn’t exist,
so Perl created it as well.
For example, this works:
my $not_yet; # new undefined variable @$not_yet = (1, 2, 3);
Here, you dereference the value $not_yet
as if it
were an array reference. But since it’s initially
undef
, Perl acts as if you had said:
my $not_yet; $not_yet = [ ]; # inserted through autovivification @$not_yet = (1, 2, 3);
In other words, an initially empty array becomes an array of three elements.
This autovivification also works for multiple levels of assignment:
my $top; $top->[2]->[4] = "lee-lou";
Initially, $top
contains undef
,
but because it is dereferenced as if it were an array reference, Perl
inserts a reference to an empty anonymous array into
$top
. The third element (index value 2) is then
accessed, which causes Perl to grow the array to be three elements
long. That element is also undef
, so it is stuffed
with a reference to another empty anonymous array. We then spin out
along that newly created array, setting the fifth element to
lee-lou
.
Autovivification also works for
hash references. If a variable containing undef
is
dereferenced as if it were a hash reference, a reference to an empty
anonymous hash is inserted, and the operation continues.
One place this comes in very handy is in a typical data reduction task. For example let’s say the Professor gets an island-area network up and running (perhaps using Coco-Net or maybe Vines), and now wants to track the traffic from host to host. He begins logging the number of bytes transferred to a log file, giving the source host, the destination host, and the number of transferred bytes:
professor.hut gilligan.crew.hut 1250 professor.hut lovey.howell.hut 910 thurston.howell.hut lovey.howell.hut 1250 professor.hut lovey.howell.hut 450 professor.hut laser3.copyroom.hut 2924 ginger.girl.hut professor.hut 1218 ginger.girl.hut maryann.girl.hut 199 ...
Now the Professor wants to produce a summary of the source host, the destination host, and the total number of transferred bytes for the day. Tabulating the data is as simple as:
my %total_bytes; while (<>) { my ($source, $destination, $bytes) = split; $total_bytes{$source}{$destination} += $bytes; }
Let’s see how this works on the first line of data. You’ll be executing:
$total_bytes{"professor.hut"}{"gilligan.crew.hut"} += 1250;
Because %total_bytes
is initially empty, the first
key of professor.hut
is not found, but it
establishes an undef
value for the dereferencing
as a hash reference. (Keep in mind that an implicit arrow is between
the two sets of curly braces here.) Perl sticks in a reference to an
empty anonymous hash in that element, which then is immediately
extended to include the element with a key of
gilligan.crew.hut
. Its initial value is
undef
, which acts like a zero when you add 1250 to
it, and the result of 1250 is inserted back into the hash.
Any later data line that contains this same source host and
destination host will re-use that same value, adding more bytes to
the running total. But each new destination host extends a hash to
include a new initially undef
byte count, and each
new source host uses autovivification to create a destination host
hash. In other words, Perl does the right thing, as always.
Once you’ve processed the file, it’s time to display the summary. First, you determine all the sources:
for my $source (keys %total_bytes) { ...
Now, you should get all destinations. The syntax for this is a bit tricky. You want all keys of the hash, resulting from dereferencing the value of the hash element, in the first structure:
for my $source (keys %total_bytes) { for my $destination (keys %{ $total_bytes{$source} }) { ....
For good measure, you should probably sort both lists to be consistent:
for my $source (sort keys %total_bytes) { for my $destination (sort keys %{ $total_bytes{$source} }) { print "$source => $destination:", " $total_bytes{$source}{$destination} bytes\n"; } print "\n"; }
This is a typical data-reduction report generation strategy. Simply create a hash-of-hashrefs (perhaps nested even deeper, as you’ll see later), using autovivification to fill in the gaps in the upper data structures as needed, and then walk through the resulting data structure to display the results.
The answers for all exercises can be found in Section A.3.
Without running it, can you see what’s wrong with this piece of a program? If you can’t see the problem after a minute or two, see whether trying to run it will give you a hint of how to fix it.
my %passenger_1 = { name => 'Ginger', age => 22, occupation => 'Movie Star', real_age => 35, hat => undef, }; my %passenger_2 = { name => 'Mary Ann', age => 19, hat => 'bonnet', favorite_food => 'corn', }; my @passengers = (\%passenger_1, \%passenger_2);
The
Professor’s data file (mentioned earlier in this
chapter) is available as coconet.dat
in the files
you can download from the O’Reilly web site. There
may be comment lines (beginning with a pound sign); be sure to skip
them. (That is, your program should skip them. You might find a
helpful hint if you read them!)
Modify the code from the chapter so that each source machine’s portion of the output shows the total number of bytes from that machine. List the source machines in order from most to least data transferred. Within each group, list the destination machines in order from most to least data transferred to that target from the source machine.
The result should be that the machine that sent the most data will be the first source machine in the list, and the first destination should be the machine to which it sent the most data. The Professor can use this printout to reconfigure the network for efficiency.
[18] Compare this with having to
return an array from a C function. Either a pointer to a static
memory space must be returned, making the subroutine nonreentrant, or
a new memory space must be
malloc
‘ed, requiring the caller
to know to free
the data. Perl just does the right
thing.
[19] In classrooms, we’ve seen that too much indirection (or not enough indirection) tends to contribute to the most common mistakes made when working with references.
Get Learning Perl Objects, References, and Modules now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.