O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Data Structures: Chapter 9 - Programming Perl, Third Edition

by Jon Orwant, Larry Wall, Tom Christiansen
Programming Perl, Third Edition book cover

This excerpt is from Programming Perl, Third Edition. Programming Perl is not just a book about Perl; it is also a unique introduction to the language and its culture, as one might expect only from its authors. This third edition has been expanded to cover Version 5.6 of Perl. New topics include threading, the compiler, Unicode, and other features that have been added or improved since the previous edition.

buy button

Perl provides for free many of the data structures that you have to build yourself in other programming languages. The stacks and queues that budding computer scientists learn about are both just arrays in Perl. When you push and pop (or shift and unshift) an array, it's a stack; when you push and shift (or unshift and pop) an array, it's a queue. And many of the tree structures in the world are built only to provide fast, dynamic access to a conceptually flat lookup table. Hashes, of course, are built into Perl, and provide fast, dynamic access to a conceptually flat lookup table, only without the mind-numbingly recursive data structures that are claimed to be beautiful by people whose minds have been suitably numbed already.

But sometimes you want nested data structures because they most naturally model the problem you're trying to solve. So Perl lets you combine and nest arrays and hashes to create arbitrarily complex data structures. Properly applied, they can be used to create linked lists, binary trees, heaps, B-trees, sets, graphs, and anything else you can devise. See Mastering Algorithms with Perl (O'Reilly, 1999), the Perl Cookbook (O'Reilly, 1998), or CPAN, the central repository for all such modules. But simple combinations of arrays and hashes may be all you ever need, so they're what we'll talk about in this chapter.

Arrays of Arrays

There are many kinds of nested data structures. The simplest kind to build is an array of arrays, also called a two-dimensional array or a matrix. (The obvious generalization applies: an array of arrays of arrays is a three-dimensional array, and so on for higher dimensions.) It's reasonably easy to understand, and nearly everything that applies here will also be applicable to the fancier data structures that we'll explore in subsequent sections.

Creating and Accessing a Two-Dimensional Array

Here's how to put together a two-dimensional array:

# Assign a list of array references to an array.
@AoA = (
         [ "fred", "barney" ],
         [ "george", "jane", "elroy" ],
         [ "homer", "marge", "bart" ],
);

print $AoA[2][1];   # prints "marge"

The overall list is enclosed by parentheses, not brackets, because you're assigning a list and not a reference. If you wanted a reference to an array instead, you'd use brackets:

# Create an reference to an array of array references.
$ref_to_AoA = [
    [ "fred", "barney", "pebbles", "bamm bamm", "dino", ],
    [ "homer", "bart", "marge", "maggie", ],
    [ "george", "jane", "elroy", "judy", ],
];

print $ref_to_AoA->[2][3];   # prints "judy"

Remember that there is an implied -> between every pair of adjacent braces or brackets. Therefore these two lines:

$AoA[2][3]
$ref_to_AoA->[2][3]

are equivalent to these two lines:

$AoA[2]->[3]
$ref_to_AoA->[2]->[3]

There is, however, no implied -> before the first pair of brackets, which is why the dereference of $ref_to_AoA requires the initial ->. Also remember that you can count backward from the end of an array with a negative index, so:

$AoA[0][-2]

is the next-to-last element of the first row.

Growing Your Own

Those big list assignments are well and good for creating a fixed data structure, but what if you want to calculate each element on the fly, or otherwise build the structure piecemeal?

Let's read in a data structure from a file. We'll assume that it's a plain text file, where each line is a row of the structure, and each line consists of elements delimited by whitespace. Here's how to proceed:[1]

while (<>) {
    @tmp = split;           # Split elements into an array.
    push @AoA, [ @tmp ];    # Add an anonymous array reference to @AoA.
}

Of course, you don't need to name the temporary array, so you could also say:

while (<>) {
    push @AoA, [ split ];
}

If you want a reference to an array of arrays, you can do this:

while (<>) {
    push @$ref_to_AoA, [ split ];
}

Both of those examples add new rows to the array of arrays. What about adding new columns? If you're just dealing with two-dimensional arrays, it's often easiest to use simple assignment:[2]

for $x (0 .. 9) {                       # For each row…
    for $y (0 .. 9) {                   # For each column…
        $AoA[$x][$y] = func($x, $y);    # …set that cell
    }
}

for $x ( 0..9 ) {                       # For each row…
    $ref_to_AoA->[$x][3] = func2($x);   # …set the fourth column
}

It doesn't matter in what order you assign the elements, nor does it matter whether the subscripted elements of @AoA are already there or not; Perl will gladly create them for you, setting intervening elements to the undefined value as need be. (Perl will even create the original reference in $ref_to_AoA for you if it needs to.) If you just want to append to a row, you have to do something a bit funnier:

# Append new columns to an existing row.
push @{ $AoA[0] }, "wilma", "betty";

Notice that this wouldn't work:

push $AoA[0], "wilma", "betty";  # WRONG!

That won't even compile, because the argument to push must be a real array, not just a reference to an array. Therefore, the first argument absolutely must begin with an @ character. What comes after the @ is somewhat negotiable.

Access and Printing

Now let's print the data structure. If you only want one element, this is sufficient:

print $AoA[3][2];

But if you want to print the whole thing, you can't just say:

print @AoA;         # WRONG

It's wrong because you'll see stringified references instead of your data. Perl never automatically dereferences for you. Instead, you have to roll yourself a loop or two. The following code prints the whole structure, looping through the elements of @AoA and dereferencing each inside the print statement:

for $row ( @AoA ) {
    print "@$row\n";
}

If you want to keep track of subscripts, you might do this:

for $i ( 0 .. $#AoA ) {
    print "row $i is: @{$AoA[$i]}\n";
}

or maybe even this (notice the inner loop):

for $i ( 0 .. $#AoA ) {
    for $j ( 0 .. $#{$AoA[$i]} ) {
        print "element $i $j is $AoA[$i][$j]\n";
    }
}

As you can see, things are getting a bit complicated. That's why sometimes it's easier to use a temporary variable on your way through:

for $i ( 0 .. $#AoA ) {
    $row = $AoA[$i];
    for $j ( 0 .. $#{$row} ) {
        print "element $i $j is $row->[$j]\n";
    }
}

Slices

If you want to access a slice (part of a row) of a multidimensional array, you're going to have to do some fancy subscripting. The pointer arrows give us a nice way to access a single element, but no such convenience exists for slices. You can always extract the elements of your slice one-by-one with a loop:

@part = ();
for ($y = 7; $y < 13; $y++) {
    push @part, $AoA[4][$y];
}

This particular loop could be replaced with an array slice:

@part = @{ $AoA[4] } [ 7..12 ];

If you want a two-dimensional slice, say, with $x running from 4..8 and $y from 7..12, here's one way to do it:

@newAoA = ();
for ($startx = $x = 4; $x <= 8; $x++) {
    for ($starty = $y = 7; $y <= 12; $y++) {
        $newAoA[$x - $startx][$y - $starty] = $AoA[$x][$y];
    }
}

In this example, the individual values within our destination two-dimensional array, @newAoA, are assigned one by one, taken from a two-dimensional subarray of @AoA. An alternative is to create anonymous arrays, each consisting of a desired slice of an @AoA subarray, and then put references to these anonymous arrays into @newAoA. We would then be writing references into @newAoA (subscripted once, so to speak) instead of subarray values into a twice-subscripted @newAoA. This method eliminates the innermost loop:

for ($x = 4; $x <= 8; $x++) {
    push @newAoA, [ @{ $AoA[$x] } [ 7..12 ] ];
}

Of course, if you do this often, you should probably write a subroutine called something like extract_rectangle. And if you do it very often with large collections of multidimensional data, you should probably use the PDL (Perl Data Language) module, available from CPAN.

Common Mistakes

As mentioned earlier, Perl arrays and hashes are one-dimensional. In Perl, even "multidimensional" arrays are actually one-dimensional, but the values along that dimension are references to other arrays, which collapse many elements into one. If you print these values out without dereferencing them, you will get the stringified references rather than the data you want. For example, these two lines:

@AoA = ( [2, 3], [4, 5, 7], [0] );
print "@AoA";

result in something like:

ARRAY(0x83c38) ARRAY(0x8b194) ARRAY(0x8b1d0)

On the other hand, this line displays 7:

print $AoA[1][2];

When constructing an array of arrays, remember to compose new references for the subarrays. Otherwise, you will just create an array containing the element counts of the subarrays, like this:

for $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = @array;       # WRONG!
}

Here @array is being accessed in a scalar context, and therefore yields the count of its elements, which is dutifully assigned to $AoA[$i]. The proper way to assign the reference will be shown in a moment.

After making the previous mistake, people realize they need to assign a reference, so the next mistake people naturally make involves taking a reference to the same memory location over and over again:

for $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = \@array;      # WRONG AGAIN!
}

Every reference generated by the second line of the for loop is the same, namely, a reference to the single array @array. Yes, this array changes on each pass through the loop, but when everything is said and done, $AoA contains 10 references to the same array, which now holds the last set of values assigned to it. print @{$AoA[1]} will reveal the same values as print @{$AoA[2]}.

Here's a more successful approach:

for $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = [ @array ];   # RIGHT!
}

The brackets around @array create a new anonymous array, into which the elements of @array are copied. We then store a reference to that new array.

A similar result--though more difficult to read--would be produced by:

for $i (1..10) {
    @array = somefunc($i);
    @{$AoA[$i]} = @array;
}

Since $AoA[$i] needs to be a new reference, the reference springs into existence. Then, the preceding @ dereferences this new reference, with the result that the values of @array are assigned (in list context) to the array referenced by $AoA[$i]. You might wish to avoid this construct for clarity's sake.

But there is a situation in which you might use it. Suppose @AoA is already an array of references to arrays. That is, you've made assignments like:

$AoA[3] = \@original_array;

And now suppose that you want to change @original_array (that is, you want to change the fourth row of $AoA) so that it refers to the elements of @array. This code will work:

@{$AoA[3]} = @array;

In this case, the reference itself does not change, but the elements of the referenced array do. This overwrites the values of @original_array.

Finally, the following dangerous-looking code actually works fine:

for $i (1..10) {
    my @array = somefunc($i);
    $AoA[$i] = \@array;
}

That's because the lexically scoped my @array variable is created afresh on each pass through the loop. So even though it looks as though you've stored the same variable reference each time, you haven't. This is a subtle distinction, but the technique can produce more efficient code, at the risk of misleading less-enlightened programmers. (It's more efficient because there's no copy in the final assignment.) On the other hand, if you have to copy the values anyway (which the first assignment in the loop is doing), then you might as well use the copy implied by the brackets and avoid the temporary variable:

for $i (1..10) {
    $AoA[$i] = [ somefunc($i) ];
}

In summary:

$AoA[$i] = [ @array ];   # Safest, sometimes fastest
$AoA[$i] = \@array;      # Fast but risky, depends on my-ness of array
@{ $AoA[$i] } = @array;  # A bit tricky

Once you've mastered arrays of arrays, you'll want to tackle more complex data structures. If you're looking for C structures or Pascal records, you won't find any special reserved words in Perl to set these up for you. What you get instead is a more flexible system. If your idea of a record structure is less flexible than this, or if you'd like to provide your users with something more opaque and rigid, then you can use the object-oriented features detailed in Chapter 12.

Perl has just two ways of organizing data: as ordered lists stored in arrays and accessed by position, or as unordered key/value pairs stored in hashes and accessed by name. The best way to represent a record in Perl is with a hash reference, but how you choose to organize such records will vary. You might want to keep an ordered list of these records that you can look up by number, in which case you'd use an array of hash references to store the records. Or, you might wish to look the records up by name, in which case you'd maintain a hash of hash references. You could even do both at once, with pseudohashes.

In the following sections, you will find code examples detailing how to compose (from scratch), generate (from other sources), access, and display several different data structures. We first demonstrate three straightforward combinations of arrays and hashes, followed by a hash of functions and more irregular data structures. We end with a demonstration of how these data structures can be saved. These examples assume that you have already familiarized yourself with the explanations set forth earlier in this chapter.

Hashes of Arrays

Use a hash of arrays when you want to look up each array by a particular string rather than merely by an index number. In our example of television characters, instead of looking up the list of names by the zeroth show, the first show, and so on, we'll set it up so we can look up the cast list given the name of the show.

Because our outer data structure is a hash, we can't order the contents, but we can use the sort function to specify a particular output order.

Composition of a Hash of Arrays

You can create a hash of anonymous arrays as follows:

# We customarily omit quotes when the keys are identifiers.
%HoA = (
    flintstones    => [ "fred", "barney" ],
    jetsons        => [ "george", "jane", "elroy" ],
    simpsons       => [ "homer", "marge", "bart" ],
);

To add another array to the hash, you can simply say:

$HoA{teletubbies} = [ "tinky winky", "dipsy", "laa-laa", "po" ];

Generation of a Hash of Arrays

Here are some techniques for populating a hash of arrays. To read from a file with the following format:

flintstones: fred barney wilma dino
jetsons:     george jane elroy
simpsons:    homer marge bart

you could use either of the following two loops:

while ( <> ) {
    next unless s/^(.*?):\s*//;
    $HoA{$1} = [ split ];
}

while ( $line = <> ) {
    ($who, $rest) = split /:\s*/, $line, 2;
    @fields = split ' ', $rest;
    $HoA{$who} = [ @fields ];
}

If you have a subroutine get_family that returns an array, you can use it to stuff %HoA with either of these two loops:

for $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoA{$group} = [ get_family($group) ];
}

for $group ( "simpsons", "jetsons", "flintstones" ) {
    @members = get_family($group);
    $HoA{$group} = [ @members ];
}

You can append new members to an existing array like so:

push @{ $HoA{flintstones} }, "wilma", "pebbles";

Access and Printing of a Hash of Arrays

You can set the first element of a particular array as follows:

$HoA{flintstones}[0] = "Fred";

To capitalize the second Simpson, apply a substitution to the appropriate array element:

$HoA{simpsons}[1] =~ s/(\w)/\u$1/;

You can print all of the families by looping through the keys of the hash:

for $family ( keys %HoA ) {
    print "$family: @{ $HoA{$family} }\n";
}

With a little extra effort, you can add array indices as well:

for $family ( keys %HoA ) {
    print "$family: ";
    for $i ( 0 .. $#{ $HoA{$family} } ) {
        print " $i = $HoA{$family}[$i]";
    }
    print "\n";
}

Or sort the arrays by how many elements they have:

for $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
    print "$family: @{ $HoA{$family} }\n"
}

Or even sort the arrays by the number of elements and then order the elements ASCIIbetically (or to be precise, utf8ically):

# Print the whole thing sorted by number of members and name.
for $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
    print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
}

Arrays of Hashes

An array of hashes is useful when you have a bunch of records that you'd like to access sequentially, and each record itself contains key/value pairs. Arrays of hashes are used less frequently than the other structures in this chapter.

Composition of an Array of Hashes

You can create an array of anonymous hashes as follows:

@AoH = (
    {
       husband  => "barney",
       wife     => "betty",
       son      => "bamm bamm",
    },
    {
       husband => "george",
       wife    => "jane",
       son     => "elroy",
    },    {
       husband => "homer",
       wife    => "marge",
       son     => "bart",
    },
  );

To add another hash to the array, you can simply say:

push @AoH, { husband => "fred", wife => "wilma", daughter => "pebbles" };

Generation of an Array of Hashes

Here are some techniques for populating an array of hashes. To read from a file with the following format:

husband=fred friend=barney

you could use either of the following two loops:

while ( <> ) {
    $rec = {};
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
    push @AoH, $rec;
}

while ( <> ) {
    push @AoH, { split /[\s=]+/ };
}

If you have a subroutine get_next_pair that returns key/value pairs, you can use it to stuff @AoH with either of these two loops:

while ( @fields = get_next_pair() ) {
    push @AoH, { @fields };
}

while (<>) {
    push @AoH, { get_next_pair($_) };
}

You can append new members to an existing hash like so:

$AoH[0]{pet} = "dino";
$AoH[2]{pet} = "santa's little helper";

Access and Printing of an Array of Hashes

You can set a key/value pair of a particular hash as follows:

$AoH[0]{husband} = "fred";

To capitalize the husband of the second array, apply a substitution:

$AoH[1]{husband} =~ s/(\w)/\u$1/;

You can print all of the data as follows:

for $href ( @AoH ) {
    print "{ ";
    for $role ( keys %$href ) {
         print "$role=$href->{$role} ";
    }
    print "}\n";
}

and with indices:

for $i ( 0 .. $#AoH ) {
    print "$i is { ";
    for $role ( keys %{ $AoH[$i] } ) {
         print "$role=$AoH[$i]{$role} ";
    }
    print "}\n";
}

Hashes of Hashes

A multidimensional hash is the most flexible of Perl's nested structures. It's like building up a record that itself contains other records. At each level, you index into the hash with a string (quoted when necessary). Remember, however, that the key/value pairs in the hash won't come out in any particular order; you can use the sort function to retrieve the pairs in whatever order you like.

Composition of a Hash of Hashes

You can create a hash of anonymous hashes as follows:

%HoH = (
    flintstones => {
        husband   => "fred",
        pal       => "barney",
    },
    jetsons => {
        husband   => "george",
        wife      => "jane",
        "his boy" => "elroy",  # Key quotes needed.
    },
    simpsons => {
        husband   => "homer",
        wife      => "marge",
        kid       => "bart",
    },
);

To add another anonymous hash to %HoH, you can simply say:

$HoH{ mash } = {
    captain  => "pierce",
    major    => "burns",
    corporal => "radar",
};

Generation of a Hash of Hashes

Here are some techniques for populating a hash of hashes. To read from a file with the following format:

flintstones: husband=fred pal=barney wife=wilma pet=dino

you could use either of the following two loops:

while ( <> ) {
    next unless s/^(.*?):\s*//;
    $who = $1;
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $HoH{$who}{$key} = $value;
    }
}

while ( <> ) {
    next unless s/^(.*?):\s*//;
    $who = $1;
    $rec = {};
    $HoH{$who} = $rec;
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
}

If you have a subroutine get_family that returns a list of key/value pairs, you can use it to stuff %HoH with either of these three snippets:

for $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoH{$group} = { get_family($group) };
}

for $group ( "simpsons", "jetsons", "flintstones" ) {
    @members = get_family($group);
    $HoH{$group} = { @members };
}

sub hash_families {
    my @ret;
    for $group ( @_ ) {
        push @ret, $group, { get_family($group) };
    }
    return @ret;
}
%HoH = hash_families( "simpsons", "jetsons", "flintstones" );

You can append new members to an existing hash like so:

%new_folks = (
    wife => "wilma",
    pet  => "dino";
);
for $what (keys %new_folks) {
    $HoH{flintstones}{$what} = $new_folks{$what};
}

Access and Printing of a Hash of Hashes

You can set a key/value pair of a particular hash as follows:

$HoH{flintstones}{wife} = "wilma";

To capitalize a particular key/value pair, apply a substitution to an element:

$HoH{jetsons}{'his boy'} =~ s/(\w)/\u$1/;

You can print all the families by looping through the keys of the outer hash and then looping through the keys of the inner hash:

for $family ( keys %HoH ) {
    print "$family: ";
    for $role ( keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}

In very large hashes, it may be slightly faster to retrieve both keys and values at the same time using each (which precludes sorting):

while ( ($family, $roles) = each %HoH ) {
    print "$family: ";
    while ( ($role, $person) = each %$roles ) {
        print "$role=$person ";
    }
    print "\n";
}

(Unfortunately, it's the large hashes that really need to be sorted, or you'll never find what you're looking for in the printout.) You can sort the families and then the roles as follows:

for $family ( sort keys %HoH ) {
    print "$family: ";
    for $role ( sort keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}

To sort the families by the number of members (instead of ASCIIbetically (or utf8ically)), you can use keys in a scalar context:

for $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) {
    print "$family: ";
    for $role ( sort keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}

To sort the members of a family in some fixed order, you can assign ranks to each:

$i = 0;
for ( qw(husband wife son daughter pal pet) ) { $rank{$_} = ++$i }

for $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) {
    print "$family: ";
    for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}

Hashes of Functions

When writing a complex application or network service in Perl, you might want to make a large number of commands available to your users. Such a program might have code like this to examine the user's selection and take appropriate action:

if    ($cmd =~ /^exit$/i)     { exit }
elsif ($cmd =~ /^help$/i)     { show_help() }
elsif ($cmd =~ /^watch$/i)    { $watch = 1 }
elsif ($cmd =~ /^mail$/i)     { mail_msg($msg) }
elsif ($cmd =~ /^edit$/i)     { $edited++; editmsg($msg); }
elsif ($cmd =~ /^delete$/i)   { confirm_kill() }
else {
    warn "Unknown command: `$cmd'; Try `help' next time\n";
}

You can also store references to functions in your data structures, just as you can store references to arrays or hashes:

%HoF = (                           # Compose a hash of functions
    exit    =>  sub { exit },
    help    =>  \&show_help,
    watch   =>  sub { $watch = 1 },
    mail    =>  sub { mail_msg($msg) },
    edit    =>  sub { $edited++; editmsg($msg); },
    delete  =>  \&confirm_kill,
);

if   ($HoF{lc $cmd}) { $HoF{lc $cmd}->() }   # Call function
else { warn "Unknown command: `$cmd'; Try `help' next time\n" }

In the second to last line, we check whether the specified command name (in lowercase) exists in our "dispatch table", %HoF. If so, we invoke the appropriate command by dereferencing the hash value as a function and pass that function an empty argument list. We could also have dereferenced it as &{ $HoF{lc $cmd} }(), or, as of the 5.6 release of Perl, simply $HoF{lc $cmd}().

More Elaborate Records

So far, what we've seen in this chapter are simple, two-level, homogeneous data structures: each element contains the same kind of referent as all the other elements at that level. It certainly doesn't have to be that way. Any element can hold any kind of scalar, which means that it could be a string, a number, or a reference to anything at all. The reference could be an array or hash reference, or a pseudohash, or a reference to a named or anonymous function, or an object. The only thing you can't do is to stuff multiple referents into one scalar. If you find yourself trying to do that, it's a sign that you need an array or hash reference to collapse multiple values into one.

In the sections that follow, you will find code examples designed to illustrate many of the possible types of data you might want to store in a record, which we'll implement using a hash reference. The keys are uppercase strings, a convention sometimes employed (and occasionally unemployed, but only briefly) when the hash is being used as a specific record type.

Composition, Access, and Printing of More Elaborate Records

Here is a record with six disparate fields:

$rec = {
    TEXT      => $string,
    SEQUENCE  => [ @old_values ],
    LOOKUP    => { %some_table },
    THATCODE  => \&some_function,
    THISCODE  => sub { $_[0] ** $_[1] },
    HANDLE    => \*STDOUT,
};

The TEXT field is a simple string, so you can just print it:

print $rec->{TEXT};

SEQUENCE and LOOKUP are regular array and hash references:

print $rec->{SEQUENCE}[0];
$last = pop @{ $rec->{SEQUENCE} };

print $rec->{LOOKUP}{"key"};
($first_k, $first_v) = each %{ $rec->{LOOKUP} };

THATCODE is a named subroutine and THISCODE is an anonymous subroutine, but they're invoked identically:

$that_answer = $rec->{THATCODE}->($arg1, $arg2);
$this_answer = $rec->{THISCODE}->($arg1, $arg2);

With an extra pair of braces, you can treat $rec->{HANDLE} as an indirect object:

print { $rec->{HANDLE} } "a string\n";

If you're using the FileHandle module, you can even treat the handle as a regular object:

use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print("a string\n");

Composition, Access, and Printing of Even More Elaborate Records

Naturally, the fields of your data structures can themselves be arbitrarily complex data structures in their own right:

%TV = (
    flintstones => {
        series   => "flintstones",
        nights   => [ "monday", "thursday", "friday" ],
        members  => [
            { name => "fred",    role => "husband", age  => 36, },
            { name => "wilma",   role => "wife",    age  => 31, },
            { name => "pebbles", role => "kid",     age  =>  4, },
        ],
    },    jetsons     => {
        series   => "jetsons",
        nights   => [ "wednesday", "saturday" ],
        members  => [
            { name => "george",  role => "husband", age  => 41, },
            { name => "jane",    role => "wife",    age  => 39, },
            { name => "elroy",   role => "kid",     age  =>  9, },
        ],
    },

    simpsons    => {
        series   => "simpsons",
        nights   => [ "monday" ],
        members  => [
            { name => "homer", role => "husband", age => 34, },
            { name => "marge", role => "wife",    age => 37, },
            { name => "bart",  role => "kid",     age => 11, },
        ],
    },
);

Generation of a Hash of Complex Records

Because Perl is quite good at parsing complex data structures, you might just put your data declarations in a separate file as regular Perl code, and then load them in with the do or require built-in functions. Another popular approach is to use a CPAN module (such as XML::Parser) to load in arbitrary data structures expressed in some other language (such as XML).

You can build data structures piecemeal:

$rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];

Or read them in from a file (here, assumed to be in field=value syntax):

@members = ();
while (<>) {
     %fields = split /[\s=]+/;
     push @members, { %fields };
}
$rec->{members} = [ @members ];

And fold them into larger data structures keyed by one of the subfields:

$TV{ $rec->{series} } = $rec;

You can use extra pointer fields to avoid duplicate data. For example, you might want a "kids" field included in a person's record, which might be a reference to an array containing references to the kids' own records. By having parts of your data structure refer to other parts, you avoid the data skew that would result from updating the data in one place but not in another:

for $family (keys %TV) {
    my $rec = $TV{$family};   # temporary pointer
    @kids = ();
    for $person ( @{$rec->{members}} ) {
        if ($person->{role} =~ /kid|son|daughter/) {
            push @kids, $person;
        }
    }
    # $rec and $TV{$family} point to same data!
    $rec->{kids} = [ @kids ];
}

The $rec->{kids} = [ @kids ] assignment copies the array contents--but they are merely references to uncopied data. This means that if you age Bart as follows:

$TV{simpsons}{kids}[0]{age}++;            # increments to 12

then you'll see the following result, because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] both point to the same underlying anonymous hash table:

print $TV{simpsons}{members}[2]{age};     # also prints 12

Now, to print the entire %TV structure:

for $family ( keys %TV ) {
    print "the $family";
    print " is on ", join (" and ", @{ $TV{$family}{nights} }), "\n";
    print "its members are:\n";
    for $who ( @{ $TV{$family}{members} } ) {
        print " $who->{name} ($who->{role}), age $who->{age}\n";
    }
    print "children: ";
    print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
    print "\n\n";
}

Saving Data Structures

If you want to save your data structures for use by another program later, there are many ways to do it. The easiest way is to use Perl's Data::Dumper module, which turns a (possibly self-referential) data structure into a string that can be saved externally and later reconstituted with eval or do.

use Data::Dumper;
$Data::Dumper::Purity = 1;       # since %TV is self-referential
open (FILE, "> tvinfo.perldata") or die "can't open tvinfo: $!";
print FILE Data::Dumper->Dump([\%TV], ['*TV']);
close FILE                       or die "can't close tvinfo: $!";

A separate program (or the same program) can then read in the file later:

open (FILE, "< tvinfo.perldata") or die "can't open tvinfo: $!";
undef $/;                        # read in file all at once
eval <FILE>;                     # recreate %TV
die "can't recreate tv data from tvinfo.perldata: $@" if $@;
close FILE                       or die "can't close tvinfo: $!";
print $TV{simpsons}{members}[2]{age};

or simply:

do "tvinfo.perldata"            or die "can't recreate tvinfo: $! $@";
print $TV{simpsons}{members}[2]{age};

Many other solutions are available, with storage formats ranging from packed binary (very fast) to XML (very interoperable). Check out a CPAN mirror near you today!



[1] Here as in other chapters, we omit (for clarity) the my declarations that you would ordinarily put in. In this example, you'd normally write my @tmp = split.

[2] As with the temp assignment earlier, we've simplified; the loops in this chapter would likely be written for my $x in real code.

If you enjoyed this excerpt, buy a copy of Programming Perl, Third Edition

Copyright © 2009 O'Reilly Media, Inc.