Although I don’t normally deal with typeglobs or the symbol table, I need to understand them for the tricks I’ll use in later chapters. I’ll lay the foundation for advanced topics including dynamic subroutines and jury-rigging code in this chapter.
Symbol tables organize and store Perl’s package (global) variables, and I can affect the symbol table through typeglobs. By messing with Perl’s variable bookkeeping I can do some powerful things. You’re probably already getting the benefit of some of these tricks without evening knowing it.
Before I get too far, I want to review the differences between package and lexical variables. The symbol table tracks the package variables, but not the lexical variables. When I fiddle with the symbol table or typeglobs, I’m dealing with package variables. Package variables are also known as global variables since they are visible everywhere in the program.
In Learning Perl and Intermediate
Perl, we used lexical variables whenever possible. We declared
lexical variables with my
and those
variables could only be seen inside their scope. Since lexical variables
have limited reach, I didn’t need to know all of the program to avoid a
variable name collision. Lexical variables are a bit faster too since Perl
doesn’t have to deal with the symbol table.
Lexical variables have a limited scope, and they only affect that part of the program. This
little snippet declares the variable name $n
twice in different scopes, creating two
different variables that do not interfere with each other:
my $n = 10; # outer scope my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { my $n = shift; $n ** 2; }
This double use of $n
is not a
problem. The declaration inside the subroutine is a different scope and
gets its own version that masks the other version. At the end of the
subroutine, its version of $n
disappears as if it never existed. The outer $n
is still 10
.
Package variables are a different story. Doing the same thing with
package variables stomps on the previous definition of $n
:
$n = 10; my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { $n = shift; $n ** 2; }
Perl has a way to deal with the double use of package variables,
though. The local
built-in temporarily
moves the current value, 10
, out of the
way until the end of the scope, and the entire program sees the new value,
15
, until the scope of local
ends:
$n = 10; my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { local $n = shift; $n ** 2; }
We showed the difference in Intermediate Perl. The local
version changes everything including the
parts outside of its scope while the lexical version only works inside its
scope. Here’s a small program that demonstrates it both ways. I define the
package variable $global
, and I want to
see what happens when I use the same variable name in different ways. To
watch what happens, I use the show_me
subroutine to tell me what it thinks the value of $global
is. I’ll call show_me
before I start, then subroutines that do
different things with $global
. Remember
that show_me
is outside of the lexical
scope of any other subroutine:
#!/usr/bin/perl # not strict clean, yet, but just wait $global = "I'm the global version"; show_me('At start'); lexical(); localized(); show_me('At end'); sub show_me { my $tag = shift; print "$tag: $global\n" }
The lexical
subroutine starts by
defining a lexical variable also named $global
. Within the subroutine, the value of
$global
is obviously the one I set.
However, when it calls show_me
, the
code jumps out of the subroutine. Outside of the subroutine, the lexical
variable has no effect. In the output, the line I tagged with From lexical()
shows I'm the global version
:
sub lexical { my $global = "I'm in the lexical version"; print "In lexical(), \$global is --> $global\n"; show_me('From lexical()'); }
Using local
is completely
different since it deals with the package version of the variable. When I
localize a variable name, Perl sets aside its current value for the rest
of the scope. The new value I assign to the variable is visible throughout
the entire program until the end of the scope. When I call show_me
, even though I jump out of the
subroutine, the new value for $global
that I set in the subroutine is still visible:
sub localized { local $global = "I'm in the localized version"; print "In localized(), \$global is --> $global\n"; show_me('From localized'); }
The output shows the difference. The value of $global
starts off with its original version. In
lexical()
, I give it a new value but
show_me
can’t see it; show_me
still sees the global version. In
localized()
, the new value sticks even
in show_me
. However, after I’ve called
localized()
, $global
comes back to its original
values:
At start: I'm the global version In lexical(), $global is --> I'm in the lexical version From lexical: I'm the global version In localized(), $global is --> I'm in the localized version From localized: I'm in the localized version At end: I'm the global version
Hold that thought for a moment because I’ll use it again after I introduce typeglobs.
No matter which part of my program I am in or which package
I am in, I can always get to the package variables as long as I preface
the variable name with the full package name. Going back to my lexical()
, I can see the package version of
the variable even when that name is masked by a lexical variable of the
same name. I just have to add the full package name to it, $main::global
:
sub lexical { my $global = "I'm in the lexical version"; print "In lexical(), \$global is --> $global\n"; print "The package version is still --> $main::global\n"; show_me('From lexical()'); }
The output shows that I have access to both:
In lexical, $global is --> I'm the lexical version The package version is still --> I'm the global version
That’s not the only thing I can do, however. If, for some odd
reason, I have a package variable with the same name as a lexical
variable that’s currently in scope, I can use our
(introduced in Perl 5.6) to tell Perl to
use the package variable for the rest of the scope:
sub lexical { my $global = "I'm in the lexical version"; our $global; print "In lexical with our, \$global is --> $global\n"; show_me('In lexical()'); }
Now the output shows that I don’t ever get to see the lexical version of the variable:
In lexical with our, $global is --> I'm the global version
It seems pretty silly to use our
that way since it masks the lexical
version for the rest of the subroutine. If I only need the package
version for part of the subroutine, I can create a scope just for it so
I can use it for that part and let the lexical version take the
rest:
sub lexical { my $global = "I'm in the lexical version"; { our $global; print "In the naked block, our \$global is --> $global\n"; } print "In lexical, my \$global is --> $global\n"; print "The package version is still --> $main::global\n"; show_me('In lexical()'); }
Now the output shows all of the possible ways I can use $global
:
In the naked block, our $global is --> I'm the global version In lexical, my $global is --> I'm the lexical version The package version is still --> I'm the global version
Each package has a special hash-like data structure called the symbol table, which comprises all of the typeglobs for that package. It’s not a real Perl hash, but it acts like it in some ways, and its name is the package name with two colons on the end.
This isn’t a normal Perl hash, but I can look in it with
the keys
operator. Want to
see all of the symbol names defined in the main
package? I simply print all the keys for
this special hash:
#!/usr/bin/perl foreach my $entry ( keys %main:: ) { print "$entry\n"; }
I won’t show the output here because it’s rather long, but when I
look at it, I have to remember that those are the variable names without
the sigils. When I see the identifier _
, I have to
remember that it has references to the variables $_
, @_
, and
so on. Here are some special variable names that Perl programmers will
recognize once they put a sigil in front of them:
/ " ARGV INC ENV $ - 0 @
If I look in another package, I don’t see anything because I haven’t defined any variables yet:
#!/usr/bin/perl foreach my $entry ( keys %Foo:: ) { print "$entry\n"; }
If I define some variables in package Foo
, I’ll then be able to see some
output:
#!/usr/bin/perl package Foo; @n = 1 .. 5; $string = "Hello Perl!\n"; %dict = { 1 => 'one' }; sub add { $_[0] + $_[1] } foreach my $entry ( keys %Foo:: ) { print "$entry\n"; }
The output shows a list of the identifier names without any sigils attached. The symbol table stores the identifier names:
n add string dict
These are just the names, not the variables I defined, and from this output I can’t tell which variables I’ve defined. To do that, I can use the name of the variable in a symbolic reference, which I’ll cover in Chapter 9:
#!/usr/bin/perl foreach my $entry ( keys %main:: ) { print "-" x 30, "Name: $entry\n"; print "\tscalar is defined\n" if defined ${$entry}; print "\tarray is defined\n" if defined @{$entry}; print "\thash is defined\n" if defined %{$entry}; print "\tsub is defined\n" if defined &{$entry}; }
I can use the other hash operators on these hashes, too. I can
delete all of the variables with the same name. In the next program, I
define the variables $n
and $m
then assign values to them. I call show_foo
to list the variable names in the
Foo
package, which I use because it
doesn’t have all of the special symbols that the main
package does:
#!/usr/bin/perl # show_foo.pl package Foo; $n = 10; $m = 20; show_foo( "After assignment" ); delete $Foo::{'n'}; delete $Foo::{'m'}; show_foo( "After delete" ); sub show_foo { print "-" x 10, $_[0], "-" x 10, "\n"; print "\$n is $n\n\$m is $m\n"; foreach my $name ( keys %Foo:: ) { print "$name\n"; } }
The output shows me that the symbol table for Foo::
has entries for the names n
and m
, as
well as for show_foo
. Those are all of
the variable names I defined; two scalars and one subroutine. After I use
delete
, the entries for n
and m
are
gone:
----------After assignment---------- $n is 10 $m is 20 show_foo n m ----------After delete---------- $n is 10 $m is 20 show_foo
By default, Perl variables are global variables, meaning that I can access them from
anywhere in the program as long as I know their names. Perl keeps track
of them in the symbol table, which is available to the entire program.
Each package has a list of defined identifiers just like I showed in the
previous section. Each identifier has a pointer (although not in the C sense) to a slot for each
variable type. There are also two bonus slots for the variables
NAME
and PACKAGE
, which I’ll use
in a moment. The following shows the relationship between the package,
identifier, and type of variable:
Package Identifier Type Variable +------> SCALAR - $bar | +------> ARRAY - @bar | +------> HASH - %bar | Foo:: -----> bar -----+------> CODE - &bar | +------> IO - file and dir handle | +------> GLOB - *bar | +------> FORMAT - format names | +------> NAME | +------> PACKAGE
There are seven variable types. The three common ones are
the SCALAR
, ARRAY
, and HASH
, but Perl also has CODE
for subroutines
(Chapter 9 covers subroutines as data), IO
for file and directory handles, and
GLOB
for the whole thing. Once I have
the glob I can get a reference to a particular variable of that name by
accessing the right entry. To access the scalar portion of the *bar
typeglob, I access that part almost like
a hash access. Typeglobs are not hashes though; I can’t use the hash
operators on them and I can’t add more keys:
$foo = *bar{SCALAR} @baz = *bar{ARRAY}
I can’t even use these typeglob accesses as lvalues:
*bar{SCALAR} = 5;
I’ll get a fatal error:
Can't modify glob elem in scalar assignment ...
I can assign to a typeglob as a whole, though, and Perl will figure out the right place to put the value. I’ll show that in Aliasing,” later in this chapter.
I also get two bonus entries in the typeglob, PACKAGE
and NAME
, so I can always tell from which variable
I got the glob. I don’t think this is terribly useful, but maybe I’ll be
on a Perl Quiz Show someday:
#!/usr/bin/perl # typeglob-name-package.pl $foo = "Some value"; $bar = "Another value"; who_am_i( *foo ); who_am_i( *bar ); sub who_am_i { local $glob = shift; print "I'm from package " . *{$glob}{PACKAGE} . "\n"; print "My name is " . *{$glob}{NAME} . "\n"; }
Although this probably has limited usefulness, at least outside of any debugging, the output tells me more about the typeglobs I passed to the function:
I'm from package main My name is foo I'm from package main My name is bar
I don’t know what sorts of variable these are even though I have
the name. The typeglob represents all variables of that name. To check
for a particular type of variable, I’d have to use the defined
trick I used earlier:
my $name = *{$glob}{NAME}; print "Scalar $name is defined\n" if defined ${$name};
I can alias variables by assigning one typeglob to another. In this
example, all of the variables with the identifier bar
become nicknames for all of the variables
with the identifier foo
once Perl
assigns the *foo
typeglob to the
*bar
typeglob:
#!/usr/bin/perl $foo = "Foo scalar"; @foo = 1 .. 5; %foo = qw(One 1 Two 2 Three 3); sub foo { 'I'm a subroutine!' } *bar = *foo; # typeglob assignment print "Scalar is <$bar>, array is <@bar>\n"; print 'Sub returns <', bar(), ">\n"; $bar = 'Bar scalar'; @bar = 6 .. 10; print "Scalar is <$foo>, array is <@foo>\n";
When I change either the variables named bar
or foo
,
the other is changed too because they are actually the same thing with
different names.
I don’t have to assign an entire typeglob. If I assign a reference
to a typeglob, I only affect that part of the typeglob that the
reference represents. Assigning the scalar reference \$scalar
to the typeglob *foo
only affects the SCALAR
part of the typeglob. In the next line,
when I assign a \@array
to the
typeglob, the array reference only affects the ARRAY
part of the typeglob. Having done that,
I’ve made *foo
a Frankenstein’s
monster of values I’ve taken from other variables:
#!/usr/bin/perl $scalar = 'foo'; @array = 1 .. 5; *foo = \$scalar; *foo = \@array; print "Scalar foo is $foo\n"; print "Array foo is @foo\n";
This feature can be quite useful when I have a long variable name
but I want to use a different name for it. This is essentially what the
Exporter
module does when it imports symbols into my
namespace. Instead of using the full package specification, I have it in
my current package. Exporter
takes the variables from
the exporting package and assigns to the typeglob of the importing
package:
package Exporter; sub import { my $pkg = shift; my $callpkg = caller($ExportLevel); # ... *{"$callpkg\::$_"} = \&{"$pkg\::$_"} foreach @_; }
Before Perl 5.6 introduced filehandle references, if I had to pass a
subroutine a filehandle I’d have to use a typeglob. This is the most
likely use of typeglobs that you’ll see in older code. For instance, the
CGI
module can read its input from a filehandle I
specify, rather than using STDIN
:
use CGI; open FH, $cgi_data_file or die "Could not open $cgi_data_file: $!"; CGI->new( *FH ); # can't new( FH ), need a typeglob
This also works with references to typeglobs:
CGI->new( \*FH ); # can't new( FH ), need a typeglob
Again, this is the older way of doing things. The newer way involves a scalar that holds the filehandle reference:
use CGI; open my( $fh ), $cgi_data_file or die "Could not open $cgi_data_file: $!"; CGI->new( $fh );
In the old method, the filehandles were package variables so they couldn’t be lexical variables. Passing them to a subroutine, however, was a problem. What name do I use for them in the subroutine? I don’t want to use another name already in use because I’ll overwrite its value. I can’t use local with a filehandle either:
local( FH ) = shift; # won't work.
That line of code gives a compilation error:
Can't modify constant item in local ...
I have to use a typeglob instead. Perl figures out to assign the
IO
portion of the FH
typeglob:
local( *FH ) = shift; # will work.
Once I’ve done that, I use the filehandle FH
just like I would in any other situation.
It doesn’t matter to me that I got it through a typeglob assignment.
Since I’ve localized it, any filehandle of that name anywhere in the
program uses my new value, just as in my earlier local
example. Nowadays, just use filehandle
references, $fh
, and leave this stuff
to the older code (unless I’m dealing with the special filehandles
STDOUT
, STDERR
, and STDIN
).
Using typeglob assignment, I can give anonymous subroutines a name. Instead of dealing with a subroutine dereference, I can deal with a named subroutine.
The File::Find
module takes a callback function to select files from a list of
directories:
use File::Find; find( \&wanted, @dirs ); sub wanted { ... }
In File::Find::Closures
, I have several
functions that return two closures I can use with
File::Find
. That way, I can run common find
tasks without recreating the &wanted
function I need:
package File::Find::Closures; sub find_by_name { my %hash = map { $_, 1 } @_; my @files = (); ( sub { push @files, canonpath( $File::Find::name ) if exists $hash{$_} }, sub { wantarray ? @files : [ @files ] } ) }
I use File::Find::Closures
by importing the
generator function I want to use, in this case find_by_name
, and then use that function to
create two anonymous subroutines: one for find
and one to use afterward to get the
results:
use File::Find; use File::Find::Closures qw( find_by_name ); my( $wanted, $get_file_list ) = find_by_name( 'index.html' ); find( $wanted, @directories ); foreach my file ( $get_file_list->() ) { ... }
Perhaps I don’t want to use subroutine references, for whatever
reasons. I can assign the anonymous subroutines to typeglobs. Since I’m
assigning references, I only affect subroutine entry in the typeglob.
After the assignment I can then do the same thing I did with filehandles
in the last section, but this time with named subroutines. After I
assign the return values from find_by_name
to the typeglobs *wanted
and *get_file_list
, I have subroutines with those
names:
( *wanted, *get_file_list ) = find_by_name( 'index.html' ); find( \&wanted, @directories ); foreach my file ( get_file_list() ) { ... }
In Chapter 9, I’ll use this trick with
AUTOLOAD
to define subroutines on the fly or to
replace existing subroutine definitions.
The symbol table is Perl’s accounting system for package variables,
and typeglobs are the way I access them. In some cases, such as passing a
filehandle to a subroutine, I can’t get away from the typeglob because I
can’t take a reference to a filehandle package variable. To get around
some of these older limitations in Perl, programmers used typeglobs to get
to the variables they needed. That doesn’t mean that typeglobs are
outdated, though. Modules that perform magic, such as
Exporter
, uses them without me even knowing about it.
To do my own magic, typeglobs turn out to be quite handy.
Chapters 10 and 12 of Programming Perl, Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant describe symbol tables and how Perl handles them internally.
Phil Crow shows some symbol table tricks in “Symbol Table Manipulation” for Perl.com: http://www.perl.com/pub/a/2005/03/17/symtables.html.
Randal Schwartz talks about scopes in his Unix Review column for May 2003: http://www.stonehenge.com/merlyn/UnixReview/col46.html.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.