For the purposes of this chapter, I’m going to label as “dynamic
subroutines” anything I don’t explicitly name by typing sub some_name
or that doesn’t exist until runtime.
Perl is extremely flexible in letting me figure out the code as I go along,
and I can even have code that writes code. I’m going to lump a bunch of
different subroutine topics in this chapter just because there’s no good
home for them apart from each other.
We first showed anonymous subroutines in Learning Perl when we showed user-defined sorting, although we didn’t tell
you that they were anonymous subroutines. In Intermediate Perl we used them to create
closures, work with map
and grep
, and a few other things. I’ll pick up where
Intermediate Perl left off to show just how powerful
they can be. With any of these tricks, not knowing everything ahead of time
can be very liberating.
I can store anonymous subroutines in variables. They don’t actually execute until
I tell them to. Instead of storing values, I store behavior. This
anonymous subroutine adds its first two arguments and returns the result,
but it won’t do that until I execute it. I merely define the subroutine
and store it in $add_sub
:
my $add_sub = sub { $_[0] + $_[1] };
This way, I can decide what to do simply by choosing the variable
that has the behavior that I want. A simple-minded program might do this
with a series of if-elsif
tests and
branches because it needs to hardcode a branch for each possible
subroutine call. Here I create a little calculator to handle basic
arithmetic. It takes three arguments on the command line and does the
calculation. Each operation gets its own branch of code:
#!/usr/bin/perl # basic-arithmetic.pl use strict; while( 1 ) { my( $operator, @operand ) = get_line(); if( $operator eq '+' ) { add( @operand ) } elsif( $operator eq '-' ) { subtract( @operand ) } elsif( $operator eq '*' ) { multiply( @operand ) } elsif( $operator eq '/' ) { divide( @operand ) } else { print "No such operator [$operator ]!\n"; last; } } print "Done, exiting...\n"; sub get_line { # This could be a lot more complicated, but this isn't the point print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; ( split /\s+/, $line )[1,0,2]; } sub add { print $_[0] + $_[1] } sub subtract { print $_[0] - $_[1] } sub multiply { print $_[0] * $_[1] } sub divide { print $_[1] ? $_[0] / $_[1] : 'NaN' }
Those branches are really just the same thing; they take the two
operands, perform a calculation, and print the result. The only thing that
differs in each branch is the subroutine name. If I want to add more
operations, I have to add more nearly identical branches of code. Not only
that, I have to add the code to the while
loop, obscuring the intent of the loop. If
I decide to do things a bit differently, I have to change every branch.
That’s just too much work.
I can turn that on its head so I don’t have a long series of
branches to code or maintain. I want to extract the subroutine name from
the branches so I can make one block of code that works for all operators.
Ideally, the while
loop wouldn’t change
and would just deal with the basics of getting the data and sending them
to the right subroutine:
while( 1 ) { my( $operator, @operand ) = get_line(); my $some_sub = ....; print $some_sub->( @operands ); }
Now the subroutine is just something stored in the variable $some_sub
, so I have to decide how to get the
right anonymous subroutine in there. I could use a dispatch table (a hash
that stores the anonymous subroutines), and then select the subroutines by
their keys. In this case, I use the operator symbol as the key. I can also
catch bad input because I know which operators are valid: they are the
keys of the hash.
My processing loop stays the same even if I add more operators. I
also label the loop REPL
(for
Read-Evaluate-Print), and I’ll use that label later when I want to control
the looping from one of my subroutines:
#!/usr/bin/perl use strict; use vars qw( %Operators ); %Operators = ( '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { $_[1] ? eval { $_[0] / $_[1] } : 'NaN' }, ); while( 1 ) { my( $operator, @operand ) = get_line(); my $some_sub = $Operators{ $operator }; unless( defined $some_sub ) { print "Unknown operator [$operator]\n"; last; } print $Operators{ $operator }->( @operand ); } print "Done, exiting...\n"; sub get_line { print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; ( split /\s+/, $line )[1,0,2]; }
If I want to add more operators, I just add new entries to the hash.
I can add completely new operators, such as the %
operator for modulus, or the x
operator as a synonym for the *
multiplication operator:
use vars qw( %Operators ); %Operators = ( '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { eval { $_[0] / $_[1] } || 'NaN' }, '%' => sub { $_[0] % $_[1] }, ); $Operators{ 'x' } = $Operators{ '*' };
That’s fine and it works, but maybe I have to change my program so
that instead of the normal algebraic notation I use Reverse Polish
Notation (where the operands come first and the operator comes last).
That’s easy to handle because I just change the way I pick the anonymous
subroutine. Instead of looking at the middle argument, I look at the last
argument. That all happens in my get_line
subroutine. I rearrange that a bit and
everything else stays the same:
sub get_line { print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; my @list = split /\s+/, $line; unshift( @list, pop @list ); @list; }
Now that I’ve done that, I can make a little change to handle more
than just binary operators. If I want to handle something that takes more
than two arguments, I do the same thing I just did: take the last argument
and use it as the operator and pass the rest of the arguments to the
subroutine. I don’t really have to change anything other than adding a new
operator. I define a "
operator and use
the max
function from List::Util
to find the maximum value of
all the arguments I pass to it. This is similar to the example we showed
in Learning Perl to show that Perl doesn’t care how
many arguments I pass to a subroutine:
%Operators = ( # ... same stuff as before '"' => sub { my $max = shift; foreach ( @_ ) { $max = $_ if $_ > $max } $max }, );
I can also handle a single operand because my code doesn’t really care how many there are, and a list of one element is just as good as any other list. Here’s the reason that I actually wrote this program. I often need to convert between number bases, or from Unix time to a time I can read:
%Operators = ( # ... same stuff as before 'dh' => sub { sprintf "%x", $_[0] }, 'hd' => sub { sprintf "%d", hex $_[0] }, 't' => sub { scalar localtime( $_[0] ) }, );
Finally, how about an operator that works with 0 arguments? It’s
just a degenerate case of what I already have. My previous programs didn’t
have a way to stop the program. If I used those programs, I’d have to
interrupt the program. Now I can add my q
operator, which really isn’t an operator but a
way to stop the program. I cheat a little by using last
to break out of the while
loop.[36]I could do anything I like, though, including exit
straight away. In this case, I use last
with the loop label I gave to the while
:
%Operators = ( # ... same stuff as before 'q' => sub { last REPL }, );
If I need more operators, I simply add them to the hash with a reference to the subroutine that implements them. I don’t have to add any logic or change the structure of the program. I just have to describe the additional feature (although the description is in code).
In the last section I stored my anonymous subroutines in a variable, but a subroutine is really just another slot in the typeglob (see Chapter 8). I can store subroutines there, too. When I assign an anonymous subroutine to a typeglob, Perl figures out to put it in the CODE slot. After that, I use the subroutine just as if I had defined it with a name:
print "Foo is defined before\n" if defined( &foo ); *foo = sub { print "Here I am!\n" }; foo(); print "Foo is defined afterward\n" if defined( &foo );
This can be useful if I need to replace some code in another module as I’ll do in Chapter 10. I don’t want to edit the other module. I’ll leave it as it is and replace the single definition I need to change. Since subroutines live in the symbol table, I can just use the full package specification to replace a subroutine:
#!/usr/bin/perl package Some::Module; sub bar { print "I'm in " . __PACKAGE__ . "\n" } package main; Some::Module::bar(); *Some::Module::bar = sub { print "Now I'm in " . __PACKAGE__ . "\n" }; Some::Module::bar();
If I run this under warnings, Perl catches my suspicious activity and complains because I really shouldn’t be doing this without a good reason:
$ perl -w replace_sub.pl I'm in Some::Module Subroutine Some::Module::bar redefined at replace_sub.pl line 11. Now I'm in main
I change the code a bit to get around that warning. Instead of
turning off all warnings, I isolate that bit of code with a naked block
and turn off any warnings in the redefine
class:
{ no warnings 'redefine'; *Some::Module::bar = sub { print "Now I'm in " . __PACKAGE__ . "\n" }; }
Although I did this with an existing subroutine definition, I can do
it without a previous declaration, too. With a little modification my
main
package defines the new subroutine
quux
in
Some::Module
:
package Some::Module; # has no subroutines package main; { no warnings 'redefine'; *Some::Module::quux = sub { print "Now I'm in " . __PACKAGE__ . "\n" }; } Some::Module::quux();
See anything familiar? If I change it around it might look a bit more like something you’ve seen before as a trick to import symbols into another namespace. You’ve probably been doing this same thing for quite a while without even knowing about it:
package Some::Module; sub import { *main::quux = sub { print "I came from " . __PACKAGE__ . "\n" }; } package main; Some::Module->import(); quux();
This is the same thing that the Exporter
module
does to take definitions in one package and put them into another. It’s
only slightly more complicated than this because
Exporter
figures out who’s calling it and does some
work to look in @EXPORT
and @EXPORT_OK
. Other than that, it’s a bunch of
monkey programming around an assignment to a typeglob.
In the previous section, I replaced the definition of a valid subroutine name with an anonymous subroutine. I fiddled with the symbol table to make things happen. Now, I’m going to move from fiddling to abuse.
A symbolic reference, or reference to the symbol table, uses a string to choose the name of the variable and what looks like a dereference to access it:
my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo
This normally isn’t a good idea, so much so that
strict
prohibits it. Adding use strict
to my
example, I get a fatal error:
use strict; my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo
It’s the refs
portion of
strict
that causes the problem:
Can't use string ("foo") as a SCALAR ref while "strict refs" in use at program.pl line 3.
I can get around that by turning off the refs
portion temporarily:
use strict; { no strict 'refs'; my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo }
I could also just not turn on the refs
portion of strict
, but
it’s better to turn it off only when I need it and let Perl catch
unintended uses:
use strict qw(subs vars); # no 'refs'
For dynamic subroutine tricks, I want to store the subroutine name in a variable, and then turn it into a subroutine.
First, I put the name foo
into
the scalar $good_name
. I then
dereference it as a typeglob reference so I can assign my anonymous
subroutine to it. Since $good_name
isn’t a reference, Perl uses it’s value as a symbolic reference. The value
becomes the name of the typeglob Perl should look at and affect. When I
assign my anonymous subroutine to *{ $good_name
}
, I’m creating an entry in the symbol table for the current
package for a subroutine named &foo
. It also works with the full package
specification so I can create &Some::Module::foo
, too:
#!/usr/bin/perl use strict; { no strict 'refs'; my $good_name = "foo"; *{ $good_name } = sub { print "Hi, how are you?\n" }; my $remote_name = "Some::Module::foo"; *{ $remote_name } = sub { print "Hi, are you from Maine?\n" }; } foo(); # no problem Some::Module::foo(); # no problem
I can be even more abusive, though, and this is something that I shouldn’t ever do, at least not in any code that does something useful or important. Save this for an Obfuscated Perl Contest.
By putting the name in a variable I can get around Perl’s variable
naming convention. Normally, I have to start a variable name with a letter
or an underscore and follow it with letters, underscores, or digits. Now I
get around all that to create the subroutine with the name <=>
by using a symbolic reference:
{ no strict 'refs'; my $evil_name = "<=>"; *{ $evil_name } = sub { print "How did you ever call me?\n" }; # <=>() yeah, that's not gonna happen *{ $evil_name }{CODE}->(); &{$evil_name}(); # Another way ;-) }
I still can’t use my illegal subroutine in the normal way, so I have to look in its typeglob or use another symbolic reference.
In my Data::Constraint
module, I needed to
provide a way to validate a value in such a way that the user could build
up complex requirements easily and without writing code. The validation
would be a matter of configuration, not programming.
Instead of applying a validation routine to a set of values, I turned it around to apply a list of subroutines to a value. Each particular value would have its own combination of validation routines, and I’d validate each value separately (although probably still in some sort of loop). Each subroutine is a constraint on the value.
I start by defining some subroutines to check a value. I don’t know ahead of time what the values will represent or which constraints the user will place on it. I’ll make some general subroutines that the programmer can combine in any way she likes. Each subroutine returns true or false:
my %Constraints = ( is_defined => sub { defined $_[0] }, not_empty => sub { length $_[0] > 0 }, is_long => sub { length $_[0] > 8 }, has_whitespace => sub { $_[0] =~ m/\s/ }, no_whitespace => sub { $_[0] =~ m/\s/ }, has_digit => sub { $_[0] =~ m/\d/ }, only_digits => sub { $_[0] !~ m/\D/ }, has_special => sub { $_[0] =~ m/[^a-z0-9]/ }, );
The %Constraints
hash now serves
as a library of validation routines that I can use. Once defined, I figure
out how I want to use them.
For example, I want to write a password checker that looks for at least eight characters, no whitespace, at least one digit, and at least one special character. Since I’ve stored the subroutines in a hash, I just pull out the ones I need and pass the candidate password to each one:
chomp( my $password = <STDIN> ); my $fails = grep { ! $Constraints{ $_ }->( $password ) } qw( is_long no_whitespace has_digit has_special );
I use grep
in scalar context so
it returns the number of items for which its block returns true. Since I
really want the number of items that return false, I negate the return
value of the subroutine call to make false turn into true, and vice versa.
If $fails
is anything but zero, I know
that something didn’t pass.
The benefit comes when I want to apply this to many different values, each of which might have their own constraints. The technique is the same, but I have to generalize it a bit more:
my $fails = grep { ! $Constraints{ $_ }->( $input{$key} ) } @constraint_names;
From there parameter checking is simply configuration:
password is_long no_whitespace has_digit has_special employee_id not_empty only_digits last_name not_empty
I specify that configuration however I like and load it into my program. It is especially useful for nonprogrammers who need to change the behavior of the application. They don’t need to touch any code. If I store that in a file, I read in the lines and build a data structure to hold the names and the constraints that go with them. Once I have that set up, I access everything in the right way to do the same thing I did in the previous example:
while( <CONFIG> ) { chomp; my( $key, @constraints ) = split; $Config{$key} = \@constraints; } my %input = get_input(); # pretend that does something foreach my $key ( keys %input ) { my $failed = grep { ! $Constraints{ $_ }->( $input{$key} ) } @{ $Config{$key} }; push @failed, $key if $failed; } print "These values failed: @failed\n";
My code to check them is small and constant no matter how many input parameters I have or the particular requirements for each of them.
This is the basic idea behind Data::Constraint
,
although it does more work to set up the situation and return a list of
the constraints the value did not meet. I could change this up a little to
return a list of the constraints that failed:
my @failed = grep { $Constraints{ $_ }->( $value ) ? () : $_ } @constraint_names;
Much in the same way that I went through a list of constraints in the previous example, I might want to build a processing pipeline. I do the same thing: decide which subroutines to include and then iterate through that list, applying in turn each subroutine to the value.
I can normalize a value by deciding which transformations I should
perform. I store all of the transformations as subroutines in %Transformations
and then list the ones I want
to use in @process
. After that, I read
in lines on input and apply each subroutine to the line:
#!/usr/bin/perl # sub-pipeline.pl my %Transformations = ( lowercase => sub { $_[0] = lc $_[0] }, uppercase => sub { $_[0] = uc $_[0] }, trim => sub { $_[0] =~ s/^\s+|\s+$//g }, collapse_whitespace => sub { $_[0] =~ s/\s+/ /g }, remove_specials => sub { $_[0] =~ s/[^a-z0-9\s]//ig }, ); my @process = qw( remove_specials lowercase collapse_whitespace trim ); while( <STDIN> ) { foreach my $step ( @process ) { $Transformations{ $step }->( $_ ); print "Processed value is now [$_]\n"; } }
I might even combine this sort of thing with the constraint checking I did in the previous section. I’ll clean up the value before I check its validity. The input and processing code is very short and should stay that way. The complexity is outside of the flow of the data.
This section isn’t really like the previous two, but I always
think of it when I talk about these techniques. As we told you in
Intermediate Perl, I can use a scalar variable in the
place of a method name as long as the value is a simple scalar (so, no
references or other oddities). This works just fine as long as the object
can respond to the foo
method:
my $method_name = 'foo'; $object->$method_name;
If I want to run a chain of methods on an object, I can just go
through the list of method names like I did for the anonymous subroutines.
It’s not really the same thing to Perl, but for the programmer it’s the
same sort of thinking. I go through the method names using map
to get all of the values that I want:
my $isbn = Business::ISBN->new( '0596101058' ); my( $country, $publisher, $item ) = map { $isbn->$_ } qw( country_code publisher_code article_code );
I don’t have parallel code where I have to type the same thing many times. Again, the code to extract the values I need is very short and the complexity of choosing and listing the methods I need happens away from the important parts of the code flow.
Because subroutine references are scalars, I can pass them as arguments to other subroutines:
my $nameless_sub = sub { ... }; foo( $nameless_sub );
But I don’t want to pass these things as scalars; I want to do the
fancy things that sort
, map
, and grep
do by using inline blocks:
my @odd_numbers = grep { $_ % 2 } 0 .. 100; my @squares = map { $_ * $_ } 0 .. 100; my @sorted = sort { $a <=> $b } qw( 1 5 2 0 4 7 );
To work this little bit of magic, I need to use Perl’s subroutine prototypes. Someone may have told you that prototypes are as useless as they are evil, but in this case I need them to tell Perl that the naked block of code represents a subroutine.
As an example, I want to write something that reduces a list to a
single value according to the block of code that I give it. Graham
Barr does this in List::Util
with the reduce
function, which takes a list and turns it
into a single value according to the subroutine I give it. This snippet
turns a list of numbers into its sum:
use List::Util; my $sum = reduce { $a + $b } @list;
The reduce
function is a well-known method to process a list and you’ll see it
in many other languages. To seed the operation, it takes the first two
arguments off of the list and computes the result according to the inline
subroutine. After that, it takes the result and the next element of the
list and repeats the computation, doing that until it has gone through all
of the elements of the list.
As with map
, grep
, and sort
, I don’t put a comma after the inline
subroutine argument to reduce
. To get
this to work, though, I need to use Perl’s subroutine prototypes to tell
the subroutine to expect an inline subroutine.
The List::Util
module implements its functions in
XS to make them really speedy, but in case I can’t load the XS stuff for
some reason, Graham has a pure Perl backup:
package List::Util; sub reduce (&@) { my $code = shift; no strict 'refs'; return shift unless @_ > 1; use vars qw($a $b); my $caller = caller; local(*{$caller."::a"}) = \my $a; local(*{$caller."::b"}) = \my $b; $a = shift; foreach (@_) { $b = $_; $a = &{$code}(); } $a; }
In his prototype, Graham specifies (&@)
. The &
tells Perl that the first argument is a
subroutine, and the @
says the rest is
a list. The perlsub documentation has the list of
prototype symbols and their meanings, but this is all I need here.
The rest of reduce
works like
sort
by putting two elements into the
package variables $a
and $b
. Graham defines the lexical variables with
those names, and immediately assigns to the typeglobs for $a
and $b
in
the calling package by using symbolic references. After that the values of
$a
and $b
are the lexical versions. When he calls the
subroutine argument &{$code}()
,
that code looks at its package variables, which are the ones in effect
when I wrote the subroutine. Got that? Inside reduce
, I’m using the lexical versions, but
inside $code
, I’m using the package
versions from the calling package. That’s why Graham made them aliases of
each other.
I can get rid of the $a
and
$b
global variables, too. To do that, I
can use @_
instead:
my $count = reduce { $_[0] + $_[1] } @list;
Since @_
is one of Perl’s special variables that always live in the main::
package, I don’t have to worry about the
calling package. I also don’t have to worry about putting the list
elements in variables. I can play with @_
directly. I call the anonymous subroutine
with the first two elements in @_
and
put the result back into @_
. I keep
doing that until @_
has only one
element, which I finally return:
sub reduce(&@) { my $sub = shift; while( @_ > 1 ) { unshift @_, $sub->( shift, shift ); } return $_[0]; }
So far this has only worked with flat lists. What if I wanted to do
a similar thing with a complex data structure? In my
Object::Iterate
module, I created versions of map
and grep
that I can use with arbitrary data structures in objects. I call my
versions imap
and igrep
:[37]
use Object:Iterate; my @filtered = igrep {...} $object; my @transformed = imap {...} $object;
I use the same prototype magic I used before, although this time the
second argument is a scalar because I’m working with an object instead of
a list. I use the prototype, (&$)
:
sub igrep (&$) { my $sub = shift; my $object = shift; $object->_check_object; my @output = (); while( $object->__more__ ) { local $_ = $object->__next__; push @output, $_ if $sub->(); } $object->__final__ if $object->can( __final__ ); wantarray ? @output : scalar @output; } sub _check_object { croak( "iterate object has no __next__ method" ) unless eval { $_[0]->can( '__next__' ) }; croak( "iterate object has no __more__ method" ) unless eval { $_[0]->can( '__more__' ) }; $_[0]->__init__ if eval { $_[0]->isa( '__init__' ) }; return 1; }
In igrep
, I put the inline
subroutine argument into $sub
and the
object argument into $object
.
Object::Iterate
works by relying on the object to
provide methods to get the next elements for the iteration. I ensure that
the object can respond to those methods by calling _check_object
, which returns true if the object
has the right methods.
The __more__
method lets igrep
know if there are any more elements to
process. If there are more elements to process, igrep
uses the __next__
method to get the next element from the
object. No matter what I’ve done to store the data in my object, igrep
doesn’t worry about it because it makes
the object figure it out.
Once I have an element, I assign it to $_
, just like the normal versions of map
and grep
do. Inside my inline, I use $_
as the
current element.
Here’s a short example using my
Netscape::Bookmarks
module. I want to walk through its
tree of categories and links to check all of the links. Once I get my
$bookmarks
object, I use it with
igrep
. Inside the inline subroutine, I
use the check_link
function from my
HTTP::SimpleLinkChecker
module to get
the HTTP status of the link. If it’s 200
, the link is okay, but since I want the bad
links, I igrep
for the ones that aren’t
200
. Finally, I print the number of bad
links along with the list of links:
#!/usr/bin/perl # bookmark-checker.pl use HTTP::SimpleLinkChecker qw(check_link); use Netscape::Bookmarks; use Object::Iterate qw(igrep); my $bookmarks = Netscape::Bookmarks->new( $ARGV[0] ); die "Did not get Bookmarks object!" unless ref $bookmarks; my @bad_links = igrep { 200 != check_link($_); } $bookmarks; { local $/ = "\n\t"; print "There are " . @bad_links . " bad links$/@bad_links\n"; }
The magic happens later in the program where I defined the special
methods to work with Object::Iterate
. I create a scope
where I can define some methods in
Netscape::Bookmarks::Category
and provide a scope for
the lexical variable @links
. My
__more__
method simply returns the
number of elements in @links
, and
__next__
returns the first element in
@links
. I could have been more fancy to
have __next__
walk through the data
structure instead of using __init__
to
get them all at once, but that would take a lot more room on the page. No
matter what I decide to do, I just have to follow the interface for
Object::Iterate
:
{ package Netscape::Bookmarks::Category; my @links = (); sub __more__ { scalar @links } sub __next__ { shift @links } sub __init__ { my $self = shift; my @categories = ( $self ); while( my $category = shift @categories ) { push @categories, $category->categories; push @links, map { $_->href } $category->links; } print "There are " . @links . " links\n"; } }
When Perl can’t find a method on a module or anywhere in its
inheritance tree, it goes back to the original class and looks for the
special subroutine AUTOLOAD
. As a catchall, Perl
sets the package variable $AUTOLOAD
to
the name of the method for which it was looking and passes
AUTOLOAD
the same parameter list. After that, it’s up
to me what I want to do.
To define a method based on AUTOLOAD
, I first have to figure out what the
method name should be. Perl puts the full package specification in
$AUTOLOAD
, and I
usually only need the last part, which I can extract with a regular
expression:
if( $AUTOLOAD =~ m/::(\w+)$/ ) { # stuff with $1 }
In some code, you’ll also see this as a substitution that discards
everything but the method name. This has the disadvantage of destroying
the original value of $AUTOLOAD
, which I might want later:
$AUTOLOAD =~ s/.*:://; # destructive, not preferred
Once I have the method name, I can do anything I like. Since I can
assign to typeglobs to define a named subroutine (as I promised in Chapter 8), I might as well do that. I use $AUTOLOAD
, which has its original with the full
package specification still, as a symbolic reference. Since $AUTOLOAD
is not a reference, Perl interprets
its typeglob dereference to mean that it should define the variable with
that name, access the typeglob, and make the assignment:
*{$AUTOLOAD} = sub { ... };
If $AUTOLOAD
is Foo::bar
, this turns into:
*{'Foo::bar'} = sub { ... };
That one line sets the right package, defines the subroutine name without defining the code that goes with it, and finally assigns the anonymous subroutine. If I were to code that myself ahead of time, my code would look like this:
{ package Foo; sub bar; *bar = sub { ... } }
Once I’ve defined the subroutine, I want to run it with the original
arguments I tried to pass to the method name. However, I want to make it
look as if AUTOLOAD
had nothing to do
with it, and I don’t want AUTOLOAD
to
be in the call stack. This is one of the few places where I should use a
goto
. This replaces AUTOLOAD
in the subroutine stack and runs the
new subroutine I’ve just defined. By using an ampersand in front of the
name and nothing on the other side, Perl uses the current @_
for the argument list of my subroutine
call:[38]
goto &{$AUTOLOAD};
In Chapter 14 of Intermediate Perl, we use
AUTOLOAD
to define subroutines on the
fly. We look in $AUTOLOAD
. If the
method name is the same as something in @elements
, we create an anonymous subroutine to
return the value for the hash element with that key. We assign that
anonymous subroutine to the typeglob with that name. That’s a symbolic
reference so we wrap a naked block around it to limit the scope of our
no strict 'refs'
. Finally, once we’ve
made the typeglob assignment we use goto
to redispatch the method call to the
subroutine we just defined. In effect, it’s as if the subroutine
definition was always there and the next time I call that method Perl
doesn’t have to look for it:
sub AUTOLOAD { my @elements = qw(color age weight height); our $AUTOLOAD; if ($AUTOLOAD =~ /::(\w+)$/ and grep $1 eq $_, @elements) { my $field = ucfirst $1; { no strict 'refs'; *{$AUTOLOAD} = sub { $_[0]->{$field} }; } goto &{$AUTOLOAD}; } if ($AUTOLOAD =~ /::set_(\w+)$/ and grep $1 eq $_, @elements) { my $field = ucfirst $1; { no strict 'refs'; *{$AUTOLOAD} = sub { $_[0]->{$field} = $_[1] }; } goto &{$AUTOLOAD}; } die "$_[0] does not understand $method\n"; }
One of my favorite uses of AUTOLOAD
comes from the Hash::AsObject
module by Paul
Hoffman. He does some fancy magic in his AUTOLOAD
routine so I access a hash’s values
with its keys, as I normally would, or as an object with methods named for
the keys:
use Hash::AsObject; my $hash = Hash::AsObject->new; $hash->{foo} = 42; # normal access to a hash reference print $hash->foo, "\n"; # as an object; $hash->bar( 137 ), # set a value;
It can even handle multilevel hashes:
$hash->{baz}{quux} = 149; $hash->baz->quux;
The trick is that $hash
is really
just a normal hash reference that’s blessed into a package. When I call a
method on that blessed reference, it doesn’t exist so Perl ends up in
Hash::AsObject::AUTOLOAD
. Since it’s a
pretty involved bit of code to handle lots of special cases, I won’t show
it here, but it does basically the same thing I did in the previous
section by defining subroutines on the fly.
Autosplitting is another variation on the AUTOLOAD
technique, but I haven’t seen it used as much as it used to be. Instead of
defining subroutines dynamically, AutoSplit
takes a
module and parses its subroutine definitions and stores each subroutine in
its own file. It loads a subroutine’s file only when I call that
subroutine. In a complicated API with hundreds of subroutines I don’t have
to make Perl compile every subroutine when I might just want to use a
couple of them. Once I load the subroutine, Perl does not have to compile
it again in the same program. Basically, I defer compilation until I need
it.
To use AutoSplit
, I place my subroutine
definitions after the __END__
token so Perl
does not parse or compile them. I tell AutoSplit
to take those definitions and separate
them into files:
$ perl -e 'use AutoSplit; autosplit( "MyModule.pm", "auto_dir", 0, 1, 1 );
I usually don’t need to split a file myself, though, since
ExtUtils::MakeMaker
takes care out that for me in the build process. After the module
is split, I’ll find the results in one of the auto
directories in the Perl library path. Each
of the .al files holds a single
subroutine definition:
ls ./site_perl/5.8.4/auto/Text/CSV _bite.al combine.al fields.al parse.al string.al autosplit.ix error_input.al new.al status.al version.al
To load the method definitions when I need them, I use the AUTOLOAD
method provided by AutoLoader
and typically use
it as a typeglob assignment. It knows how to find the right file, load it,
parse and compile it, and then define the subroutine:
use AutoLoader; *AUTOLOAD = \&AutoLoader::AUTOLOAD;
You may have already run into AutoSplit
at work.
If you’ve ever seen an error message like this, you’ve witnessed
AutoLoader
looking for the missing method in a file. It
doesn’t find the file, so it reports that it can’t locate the file. The
Text::CSV
module uses AutoLoader
, so when I load the module and call
an undefined method on the object, I get the error:
$ perl -MText::CSV -e '$q = Text::CSV->new; $q->foobar' Can't locate auto/Text/CSV/foobar.al in @INC ( ... ).
This sort of error almost always means that I’m using a method name that isn’t part of the interface.
I can use subroutine references to represent behavior as data, and I can use the references like any other scalar.
The documentation for prototypes is in the perlsub documentation.
Mark Jason Dominus also used the function names imap
and igrep
to do the same thing I did, although his
discussion of iterators in Higher-Order Perl is much more extensive. See
http://hop.perl.plover.com/. I talk about my version
in “The Iterator Design Pattern” in The Perl Review
0.5 (September 2002), which you can get for free online: http://www.theperlreview.com/Issues/The_Perl_Review_0_5.pdf.
Mark Jason’s book covers functional programming in Perl by composing new
functions out of existing ones, so it’s entirely devoted to fancy
subroutine magic.
Randy Ray writes about autosplitting modules in The Perl Journal number 6. For the longest time it seemed that this was my favorite article on Perl and the one that I’ve read the most times.
Nathan Torkington’s “CryptoContext” appears in The Perl Journal number 9 and the compilation The Best of The Perl Journal: Computer Science & Perl Programming.
[36] Normally, exiting a subroutine by using next
, last
, or redo
is a not a good thing. That doesn’t
mean it’s a bad thing, but it’s odd enough to have its own warning in
perldiag.
[37] I think Mark Jason Dominus used these names before I did, but I don’t think I was reading his Higher-Order Perl mailing list when I came up with the names. In a footnote to my “Iterator Design Pattern” article in The Perl Review 0.5, I seem to think it was a coincidence. We were both thinking about iterators at that point, although I was thinking about how cool design patterns are and he was thinking how stupid they are. We were probably both right.
[38] Nathan Torkington talks about this in “CryptoContext” in The Perl Journal number 9.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.