Chapter 8. Symbol Tables and Typeglobs

Although I don’t normally deal with typeglobs or the symbol table, I need to understand them for the tricks I’ll use in later chapters. I’ll lay the foundation for advanced topics including dynamic subroutines and jury-rigging code in this chapter.

Symbol tables organize and store Perl’s package (global) variables, and I can affect the symbol table through typeglobs. By messing with Perl’s variable bookkeeping I can do some powerful things. You’re probably already getting the benefit of some of these tricks without evening knowing it.

Package and Lexical Variables

Before I get too far, I want to review the differences between package and lexical variables. The symbol table tracks the package variables, but not the lexical variables. When I fiddle with the symbol table or typeglobs, I’m dealing with package variables. Package variables are also known as global variables since they are visible everywhere in the program.

In Learning Perl and Intermediate Perl, we used lexical variables whenever possible. We declared lexical variables with my and those variables could only be seen inside their scope. Since lexical variables have limited reach, I didn’t need to know all of the program to avoid a variable name collision. Lexical variables are a bit faster too since Perl doesn’t have to deal with the symbol table.

Lexical variables have a limited scope, and they only affect that part of the program. This little snippet declares the variable name $n twice in different scopes, creating two different variables that do not interfere with each other:

my $n = 10; # outer scope

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { my $n = shift; $n ** 2; }

This double use of $n is not a problem. The declaration inside the subroutine is a different scope and gets its own version that masks the other version. At the end of the subroutine, its version of $n disappears as if it never existed. The outer $n is still 10.

Package variables are a different story. Doing the same thing with package variables stomps on the previous definition of $n:

$n = 10;

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { $n = shift; $n ** 2; }

Perl has a way to deal with the double use of package variables, though. The local built-in temporarily moves the current value, 10, out of the way until the end of the scope, and the entire program sees the new value, 15, until the scope of local ends:

$n = 10;

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { local $n = shift; $n ** 2; }

We showed the difference in Intermediate Perl. The local version changes everything including the parts outside of its scope while the lexical version only works inside its scope. Here’s a small program that demonstrates it both ways. I define the package variable $global, and I want to see what happens when I use the same variable name in different ways. To watch what happens, I use the show_me subroutine to tell me what it thinks the value of $global is. I’ll call show_me before I start, then subroutines that do different things with $global. Remember that show_me is outside of the lexical scope of any other subroutine:

#!/usr/bin/perl

# not strict clean, yet, but just wait
$global = "I'm the global version";

show_me('At start');
lexical();
localized();
show_me('At end');

sub show_me
        {
        my $tag = shift;

        print "$tag: $global\n"
        }

The lexical subroutine starts by defining a lexical variable also named $global. Within the subroutine, the value of $global is obviously the one I set. However, when it calls show_me, the code jumps out of the subroutine. Outside of the subroutine, the lexical variable has no effect. In the output, the line I tagged with From lexical() shows I'm the global version:

sub lexical
        {
        my $global = "I'm in the lexical version";
        print "In lexical(), \$global is --> $global\n";
        show_me('From lexical()');
        }

Using local is completely different since it deals with the package version of the variable. When I localize a variable name, Perl sets aside its current value for the rest of the scope. The new value I assign to the variable is visible throughout the entire program until the end of the scope. When I call show_me, even though I jump out of the subroutine, the new value for $global that I set in the subroutine is still visible:

sub localized
        {
        local $global = "I'm in the localized version";
        print "In localized(), \$global is --> $global\n";
        show_me('From localized');
        }

The output shows the difference. The value of $global starts off with its original version. In lexical(), I give it a new value but show_me can’t see it; show_me still sees the global version. In localized(), the new value sticks even in show_me. However, after I’ve called localized(), $global comes back to its original values:

At start: I'm the global version
In lexical(), $global is --> I'm in the lexical version
From lexical: I'm the global version
In localized(), $global is --> I'm in the localized version
From localized: I'm in the localized version
At end: I'm the global version

Hold that thought for a moment because I’ll use it again after I introduce typeglobs.

Getting the Package Version

No matter which part of my program I am in or which package I am in, I can always get to the package variables as long as I preface the variable name with the full package name. Going back to my lexical(), I can see the package version of the variable even when that name is masked by a lexical variable of the same name. I just have to add the full package name to it, $main::global:

sub lexical
        {
        my $global = "I'm in the lexical version";
        print "In lexical(), \$global is --> $global\n";
        print "The package version is still --> $main::global\n";
        show_me('From lexical()');
        }

The output shows that I have access to both:

In lexical, $global is  --> I'm the lexical version
The package version is still --> I'm the global version

That’s not the only thing I can do, however. If, for some odd reason, I have a package variable with the same name as a lexical variable that’s currently in scope, I can use our (introduced in Perl 5.6) to tell Perl to use the package variable for the rest of the scope:

sub lexical
        {
        my $global = "I'm in the lexical version";
        our $global;
        print "In lexical with our, \$global is --> $global\n";
        show_me('In lexical()');
        }

Now the output shows that I don’t ever get to see the lexical version of the variable:

In lexical with our, $global is  --> I'm the global version

It seems pretty silly to use our that way since it masks the lexical version for the rest of the subroutine. If I only need the package version for part of the subroutine, I can create a scope just for it so I can use it for that part and let the lexical version take the rest:

sub lexical
        {
        my $global = "I'm in the lexical version";

                {
                our $global;
                print "In the naked block, our \$global is --> $global\n";
                }

        print "In lexical, my \$global is --> $global\n";
        print "The package version is still --> $main::global\n";
        show_me('In lexical()');
        }

Now the output shows all of the possible ways I can use $global:

In the naked block, our $global is --> I'm the global version
In lexical, my $global is  --> I'm the lexical version
The package version is still --> I'm the global version

The Symbol Table

Each package has a special hash-like data structure called the symbol table, which comprises all of the typeglobs for that package. It’s not a real Perl hash, but it acts like it in some ways, and its name is the package name with two colons on the end.

This isn’t a normal Perl hash, but I can look in it with the keys operator. Want to see all of the symbol names defined in the main package? I simply print all the keys for this special hash:

#!/usr/bin/perl

foreach my $entry ( keys %main:: )
        {
        print "$entry\n";
        }

I won’t show the output here because it’s rather long, but when I look at it, I have to remember that those are the variable names without the sigils. When I see the identifier _, I have to remember that it has references to the variables $_, @_, and so on. Here are some special variable names that Perl programmers will recognize once they put a sigil in front of them:

/
"
ARGV
INC
ENV
$
-
0
@

If I look in another package, I don’t see anything because I haven’t defined any variables yet:

#!/usr/bin/perl

foreach my $entry ( keys %Foo:: )
        {
        print "$entry\n";
        }

If I define some variables in package Foo, I’ll then be able to see some output:

#!/usr/bin/perl

package Foo;

@n      = 1 .. 5;
$string = "Hello Perl!\n";
%dict   = { 1 => 'one' };

sub add { $_[0] + $_[1] }

foreach my $entry ( keys %Foo:: )
        {
        print "$entry\n";
        }

The output shows a list of the identifier names without any sigils attached. The symbol table stores the identifier names:

n
add
string
dict

These are just the names, not the variables I defined, and from this output I can’t tell which variables I’ve defined. To do that, I can use the name of the variable in a symbolic reference, which I’ll cover in Chapter 9:

#!/usr/bin/perl

foreach my $entry ( keys %main:: )
        {
        print "-" x 30, "Name: $entry\n";

                print "\tscalar is defined\n" if defined ${$entry};
                print "\tarray  is defined\n" if defined @{$entry};
                print "\thash   is defined\n" if defined %{$entry};
                print "\tsub    is defined\n" if defined &{$entry};
        }

I can use the other hash operators on these hashes, too. I can delete all of the variables with the same name. In the next program, I define the variables $n and $m then assign values to them. I call show_foo to list the variable names in the Foo package, which I use because it doesn’t have all of the special symbols that the main package does:

#!/usr/bin/perl
# show_foo.pl

package Foo;

$n = 10;
$m = 20;

show_foo( "After assignment" );

delete $Foo::{'n'};
delete $Foo::{'m'};

show_foo( "After delete" );

sub show_foo
        {
        print "-" x 10, $_[0], "-" x 10, "\n";

        print "\$n is $n\n\$m is $m\n";

        foreach my $name ( keys %Foo:: )
                {
                print "$name\n";
                }
        }

The output shows me that the symbol table for Foo:: has entries for the names n and m, as well as for show_foo. Those are all of the variable names I defined; two scalars and one subroutine. After I use delete, the entries for n and m are gone:

----------After assignment----------
$n is 10
$m is 20
show_foo
n
m
----------After delete----------
$n is 10
$m is 20
show_foo

Typeglobs

By default, Perl variables are global variables, meaning that I can access them from anywhere in the program as long as I know their names. Perl keeps track of them in the symbol table, which is available to the entire program. Each package has a list of defined identifiers just like I showed in the previous section. Each identifier has a pointer (although not in the C sense) to a slot for each variable type. There are also two bonus slots for the variables NAME and PACKAGE, which I’ll use in a moment. The following shows the relationship between the package, identifier, and type of variable:

Package    Identifier           Type    Variable

                       +------> SCALAR - $bar
                       |
                       +------> ARRAY  - @bar
                       |
                       +------> HASH   - %bar
                       |
Foo:: -----> bar  -----+------> CODE   - &bar
                       |
                       +------> IO     - file and dir handle
                       |
                       +------> GLOB   - *bar
                       |
                       +------> FORMAT - format names
                       |
                       +------> NAME
                       |
                       +------> PACKAGE

There are seven variable types. The three common ones are the SCALAR, ARRAY, and HASH, but Perl also has CODE for subroutines (Chapter 9 covers subroutines as data), IO for file and directory handles, and GLOB for the whole thing. Once I have the glob I can get a reference to a particular variable of that name by accessing the right entry. To access the scalar portion of the *bar typeglob, I access that part almost like a hash access. Typeglobs are not hashes though; I can’t use the hash operators on them and I can’t add more keys:

$foo = *bar{SCALAR}

@baz = *bar{ARRAY}

I can’t even use these typeglob accesses as lvalues:

*bar{SCALAR} = 5;

I’ll get a fatal error:

Can't modify glob elem in scalar assignment ...

I can assign to a typeglob as a whole, though, and Perl will figure out the right place to put the value. I’ll show that in Aliasing,” later in this chapter.

I also get two bonus entries in the typeglob, PACKAGE and NAME, so I can always tell from which variable I got the glob. I don’t think this is terribly useful, but maybe I’ll be on a Perl Quiz Show someday:

#!/usr/bin/perl
# typeglob-name-package.pl

$foo = "Some value";
$bar = "Another value";

who_am_i( *foo );
who_am_i( *bar );

sub who_am_i
        {
        local $glob = shift;

        print "I'm from package " . *{$glob}{PACKAGE} . "\n";
        print "My name is "       . *{$glob}{NAME}    . "\n";
        }

Although this probably has limited usefulness, at least outside of any debugging, the output tells me more about the typeglobs I passed to the function:

I'm from package main
My name is foo
I'm from package main
My name is bar

I don’t know what sorts of variable these are even though I have the name. The typeglob represents all variables of that name. To check for a particular type of variable, I’d have to use the defined trick I used earlier:

my $name = *{$glob}{NAME};

print "Scalar $name is defined\n" if defined ${$name};

Aliasing

I can alias variables by assigning one typeglob to another. In this example, all of the variables with the identifier bar become nicknames for all of the variables with the identifier foo once Perl assigns the *foo typeglob to the *bar typeglob:

#!/usr/bin/perl

$foo = "Foo scalar";
@foo = 1 .. 5;
%foo = qw(One 1 Two 2 Three 3);
sub foo { 'I'm a subroutine!' }

*bar = *foo;  # typeglob assignment

print "Scalar is <$bar>, array is <@bar>\n";
print 'Sub returns <', bar(), ">\n";

$bar = 'Bar scalar';
@bar = 6 .. 10;

print "Scalar is <$foo>, array is <@foo>\n";

When I change either the variables named bar or foo, the other is changed too because they are actually the same thing with different names.

I don’t have to assign an entire typeglob. If I assign a reference to a typeglob, I only affect that part of the typeglob that the reference represents. Assigning the scalar reference \$scalar to the typeglob *foo only affects the SCALAR part of the typeglob. In the next line, when I assign a \@array to the typeglob, the array reference only affects the ARRAY part of the typeglob. Having done that, I’ve made *foo a Frankenstein’s monster of values I’ve taken from other variables:

#!/usr/bin/perl

$scalar = 'foo';
@array  = 1 .. 5;

*foo = \$scalar;
*foo = \@array;

print "Scalar foo is $foo\n";
print "Array foo is @foo\n";

This feature can be quite useful when I have a long variable name but I want to use a different name for it. This is essentially what the Exporter module does when it imports symbols into my namespace. Instead of using the full package specification, I have it in my current package. Exporter takes the variables from the exporting package and assigns to the typeglob of the importing package:

package Exporter;

sub import {
        my $pkg = shift;
        my $callpkg = caller($ExportLevel);

        # ...
        *{"$callpkg\::$_"} = \&{"$pkg\::$_"} foreach @_;
        }

Filehandle Arguments in Older Code

Before Perl 5.6 introduced filehandle references, if I had to pass a subroutine a filehandle I’d have to use a typeglob. This is the most likely use of typeglobs that you’ll see in older code. For instance, the CGI module can read its input from a filehandle I specify, rather than using STDIN:

use CGI;

open FH, $cgi_data_file or die "Could not open $cgi_data_file: $!";

CGI->new( *FH ); # can't new( FH ), need a typeglob

This also works with references to typeglobs:

CGI->new( \*FH ); # can't new( FH ), need a typeglob

Again, this is the older way of doing things. The newer way involves a scalar that holds the filehandle reference:

use CGI;
open my( $fh ), $cgi_data_file or die "Could not open $cgi_data_file: $!";
CGI->new( $fh );

In the old method, the filehandles were package variables so they couldn’t be lexical variables. Passing them to a subroutine, however, was a problem. What name do I use for them in the subroutine? I don’t want to use another name already in use because I’ll overwrite its value. I can’t use local with a filehandle either:

local( FH ) = shift; # won't work.

That line of code gives a compilation error:

Can't modify constant item in local ...

I have to use a typeglob instead. Perl figures out to assign the IO portion of the FH typeglob:

local( *FH ) = shift; # will work.

Once I’ve done that, I use the filehandle FH just like I would in any other situation. It doesn’t matter to me that I got it through a typeglob assignment. Since I’ve localized it, any filehandle of that name anywhere in the program uses my new value, just as in my earlier local example. Nowadays, just use filehandle references, $fh, and leave this stuff to the older code (unless I’m dealing with the special filehandles STDOUT, STDERR, and STDIN).

Naming Anonymous Subroutines

Using typeglob assignment, I can give anonymous subroutines a name. Instead of dealing with a subroutine dereference, I can deal with a named subroutine.

The File::Find module takes a callback function to select files from a list of directories:

use File::Find;

find( \&wanted, @dirs );

sub wanted { ... }

In File::Find::Closures, I have several functions that return two closures I can use with File::Find. That way, I can run common find tasks without recreating the &wanted function I need:

package File::Find::Closures;

sub find_by_name
        {
        my %hash  = map { $_, 1 } @_;
        my @files = ();

        (
        sub { push @files, canonpath( $File::Find::name )
                if exists $hash{$_} },
        sub { wantarray ? @files : [ @files ] }
        )
        }

I use File::Find::Closures by importing the generator function I want to use, in this case find_by_name, and then use that function to create two anonymous subroutines: one for find and one to use afterward to get the results:

use File::Find;
use File::Find::Closures qw( find_by_name );

my( $wanted, $get_file_list ) = find_by_name( 'index.html' );

find( $wanted, @directories );

foreach my file ( $get_file_list->() )
        {
        ...
        }

Perhaps I don’t want to use subroutine references, for whatever reasons. I can assign the anonymous subroutines to typeglobs. Since I’m assigning references, I only affect subroutine entry in the typeglob. After the assignment I can then do the same thing I did with filehandles in the last section, but this time with named subroutines. After I assign the return values from find_by_name to the typeglobs *wanted and *get_file_list, I have subroutines with those names:

( *wanted, *get_file_list ) = find_by_name( 'index.html' );

find( \&wanted, @directories );

foreach my file ( get_file_list() )
        {
        ...
        }

In Chapter 9, I’ll use this trick with AUTOLOAD to define subroutines on the fly or to replace existing subroutine definitions.

Summary

The symbol table is Perl’s accounting system for package variables, and typeglobs are the way I access them. In some cases, such as passing a filehandle to a subroutine, I can’t get away from the typeglob because I can’t take a reference to a filehandle package variable. To get around some of these older limitations in Perl, programmers used typeglobs to get to the variables they needed. That doesn’t mean that typeglobs are outdated, though. Modules that perform magic, such as Exporter, uses them without me even knowing about it. To do my own magic, typeglobs turn out to be quite handy.

Further Reading

Chapters 10 and 12 of Programming Perl, Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant describe symbol tables and how Perl handles them internally.

Phil Crow shows some symbol table tricks in “Symbol Table Manipulation” for Perl.com: http://www.perl.com/pub/a/2005/03/17/symtables.html.

Randal Schwartz talks about scopes in his Unix Review column for May 2003: http://www.stonehenge.com/merlyn/UnixReview/col46.html.

Get Mastering Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.