Programming with Perl Modules: Chapter 1Introduction to Perl Modules and CPAN
Perl modules are best described as batches of reusable code. Want to send an email message from your Perl program? You could write the code from scratch, or you can just use Net::SMTP. Want to give your script an elegant graphical interface? Take a look at the pTk module, which does just that. The virtues extolled for Perl programmers are laziness, impatience, and hubris. Together, these admirable characteristics have led to the creation and use of many publicly accessible Perl modules. Because of laziness, programmers would rather write modules than repeat a procedure over and over (and would rather use modules written by other people than write new code from scratch). Because of impatience, programmers write consolidated code that is flexible enough to anticipate their future needs. And because of hubris, programmers share their triumphs with the rest of the Perl community and continually tweak their modules until they're the best they can be.
Recent Perl distributions include a variety of modules that perform a number of tasks, from parsing command-line arguments using the This chapter offers a conceptual overview of packages and modules in Perl and an introduction to the structure of the Comprehensive Perl Archive Network (CPAN). If you're interested in writing your own Perl modules, refer to Chapter 13, Contributing to CPAN, which details the process of writing Perl modules and how to register with and distribute your contributions through CPAN.
What Are Packages?Most people consider it rude to enter someone's home without knocking on the door. Even if you're a family member or close friend, you're probably imposing on someone's privacy if you don't alert them when you arrive.Perl provides a mechanism to separate residents and guests, known as packages. A package can act like the front door to your house; you only invite people you know to come inside; you decide who can enter. People who live in your house extend the same courtesy to your neighbors by knocking before entering those people's homes. Your residence and property might be compared to a package's namespace: when you buy a property, the mortgage is in your name--"Nathan owns this house." So, what if I live in a duplex or condominium? The same applies. Although there may be 20 units in your building, each unit has its own address and door. A package, then, is a namespace implementation that protects packages from affecting variables in other packages.
The extent of the effects of the package statement includes everything from the package declaration through the end of the enclosing block, eval, end of file, or declaration of another package--whichever comes first. A package statement affects only dynamic variables (globals, even when
So What's in a Name?As mentioned above, a package starts with a package statement; let's work with a package namedBushWhack:
package BushWhack;Let's add a subroutine called lawn_kid():
sub lawn_kid {
my $lk = shift;
print("$lk is a lawn kid.\n");
}
The code compiled in package BushWhack can access lawn_kid() without fully qualifying its name:
lawn_kid('Joe'); # or
BushWhack::lawn_kid('Joe\'s sister Sue');
Now's let's add package LawnCare to the same file:
package LawnCare;Bear in mind that it is confusing to have multiple packages in the same file. Look at this:
my$asleep = 151; my $not_paying_attention = 20; package DUH; print "$not_paying_attention, $asleep\n"; package WAKEUP; print "$not_paying_attention, $asleep\n";Oops. The $not_paying_attention is visible in both pieces of code, because a package declaration only affects dynamics (globals), not lexicals (my()s). And both packages could have their own global $not_paying_attention, both accessing them as $DUH::not_paying_attention and $WAKEUP::not_paying_attention, respectively. But code compiled in those packages in a different scope (block, eval, file) can't get the lexical $not_paying_attention from the scope above. And a lexical can't be qualified with a package namespace.
The
sub awful_chemical {
my $ac = shift;
print("$ac is a(n) awful chemical.\n");
}
This function, however, can't be called from BushWhack in the same way. To call awful_chemical() from BushWhack, give it the package name where the subroutine lives:
LawnCare::awful_chemical('Chlorine');
Otherwise, you'll get an undefined subroutine error.
You are able to create a
Packages and Symbol TablesA package's namespace is a symbol table. The name of your package is stored in a hash named after your package with two colons appended to it. If you name a packageBushWhack, its symbol table name is %BushWhack::. Packages are represented as %main:: or %:: in the symbol table by default. Since we're dealing with a hash, each key must have a value. Because keys are identifiers, values are the corresponding typeglob values; globs are pretty efficient because they do the symbol table lookups at compile-time.
In other words,
local *low_flyer = *BushWhack::variable; # compile time
local *low_flyer = *BushWhack::{"variable"}; # run time
You can look up all the keys and variables of a package with this example. You may use undef() on these to clear their memory, and they will be reported as undefined. You shouldn't undefine anything here unless you don't plan to load these packages again. Because the memory has already been filled, it saves time when you load them if you leave them defined:[1]
foreach $symbol_name (sort keys %BushWhack::) {
local *local_sym = $BushWhack::{$symbol_name};
print "\$$symbol_name is defined\n"
if($local_sym);
print "\@$symbol_name is defined\n"
if(@local_sym);
print "\%$symbol_name is defined\n"
if(%sym);
}
Package Constructors and DestructorsThe BEGIN and END routines are constructors and destructors. A BEGIN subroutine is executed immediately; it's a way for the compiler to make a call into the interpreter.Even if you have a subroutine call that appears before BEGIN, BEGIN still executes first:
package MakeRoom;
call_me_now();
sub call_me_now { print "I'm gonna be first!\n"; } # umm, no
BEGIN { print "See, told you that I'd be first.\n"; }
END { print "th-th-th-that's all folks.\n"; }
and outputs:
See, told you that I'd be first. I'm gonna be first! th-th-th-that's all folks.Multiple BEGIN blocks are executed in the order they have been defined:
package Repetition;
call_me_now();
sub call_me_now {
# err, no
print "Hey, I *said* that I was going to be first!\n";
}
BEGIN { print "Yeah, I'm first!\n"; }
BEGIN { print "And I'm next!\n"; }
END { print "Well, you've got nothing to complain about - I'm ",
"last\n"; }
This outputs:
Yeah, I'm first! And I'm next! Hey, I *said* that I was going to be first! Well, you've got nothing to complain about - I'm lastYou can't call BEGIN; it's undefined as soon as it's finished running. Any code it uses returns to Perl's memory pool.
The END subroutine does what it says. Code contained in an END subroutine is executed when the interpreter is exiting; even if the interpreter is exiting because of a A program can have multiple END statements, where the last END is executed until the first END is reached:[2]
END { print "Am I *really* first?\n"; }
$random_file = 'some_file.ext';
open(FOO, $random_file)
or die("can't open $random_file: $!");
close(FOO);
END { print "Am I *really* second?\n"; }
This program outputs:
can't open some_file.ext: No such file or directory at myscript line 4. Am I *really* second? Am I *really* first? ModulesUnlike languages like C++ or Java, Perl doesn't use an explicit class declaration. A module may work like a class if you implement its subroutines as methods. Packages can derive methods from other packages by including the other package's name in its@ISA array. So, what's a module? A module is a package stored in a file with the same name; it is intended to be reused. Modules can export symbols into their caller's package. Symbols don't need to be explicitly exported. Class modules can also export their symbols but typically should not. Regardless of the mechanism you use to write a module or any other goodies, such as exporting symbols or creating objects, Perl modules have a .pm extension.[3] Since you probably won't be writing your modules to be pragmas (compiler directives), you should capitalize module names. Since we must use the package name as its filename, such as Nathan::LastName, we'll use a filename like Nathan/LastName.pm. In this example, we'll be discussing Some::Module, contributed by Tom Christiansen. Create a file called Some/Module.pm, and insert the following into it:
package Some::Module; # assumes Some/Module.pm
use strict;
BEGIN {
use Exporter ();
use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
# set the version for version checking
$VERSION = 1.00;
@ISA = qw(Exporter);
@EXPORT = qw(&func1 &func2 &func3);
%EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],
# your exported package globals go here,
# as well as any optionally exported functions
@EXPORT_OK = qw($Var1 %Hashit);
}
use vars @EXPORT_OK;
# nonexported package globals go here
use vars qw(@more $stuff);
# initialize package globals, first exported ones
$Var1 = '';
%Hashit = ();
# then the others (which are still accessible as $Some::Module::stuff)
$stuff = '';
@more = ();
# all file-scoped lexicals must be created before
# the functions below that use them.
# file-private lexicals go here
my $priv_var = '';
my %secret_hash = ();
# here's a file-private function as a closure,
# callable as &$priv_func; it cannot be prototyped.
my $priv_func = sub {
# stuff goes here.
};
# make all your functions, whether exported or not;
# remember to put something interesting in the {} stubs
sub func1 {} # no prototype
sub func2() {} # proto'd void
sub func3($$) {} # proto'd to 2 scalars
# this one isn't exported, but could be called!
sub func4(\%) {} # proto'd to 1 hash ref
END { } # module clean-up code here (global destructor)
Let's look at this example more closely.
Use Versus RequireTheuse statement implies a BEGIN block. The library module is loaded and symbols imported as soon as the use statement is compiled (even before the rest of the file). use allows modules to declare subroutines that are visible as list operators to the rest of the file. More important, it also makes visible prototypes from the module subroutines from that point onward. Of course, prototypes are compile-time only, so are ignored on method calls.
In other words, You can use Perl modules in your program with:
use Module; use Module LIST;This is not the same as:
require "Module"; require "Module.pm";If you use require, you aren't importing anything from the module unless you explicitly make the module accessible. If you choose not to use use, you must do something like the following:
BEGIN { require "Module.pm"; import Module; }
or:
BEGIN { require "Module.pm"; import Module LIST; }
Let's say you have a module called TestModule containing the function test_me_out(). If you choose to use require, you need to make TestModule accessible in order to call its functions and die:
require TestModule; $value = TestModule::test_me_out();You can't employ require with TestModule in this fashion. Doing so results in a function that doesn't exist, main::test_me_out(), being called:
require TestModule; $value = test_me_out(); # wrong!You can use use to import the names from TestModule and then call test_me_out():
use TestModule; $value = test_me_out(); Object-Oriented ProgrammingStop me if you've heard this one before:
[Language name here] is a revolutionary object-oriented programming language...So?
It makes your life easier when you're trying to generate a canvas filled with bouncing heads.What does this have to do with object-oriented programming? And if anything, what does this have to do with Perl? You can use Perl modules for object-oriented programming (OOP), but this doesn't mean you'll need to write (or even rewrite) your modules with object-oriented methodology in mind. Let's put Perl modules and OOP into perspective:
An Object Is Simply a ReferenceUnlike C++ or Java, Perl doesn't have a predefined syntax for constructors. Perl constructors must allocate new memory, whereas C++'s constructors are just initializing memory already allocated when they're called. Object-oriented Perl modules use a subroutine that returns a reference to something "blessed" into a class as a constructor--withbless(). bless() marks a reference with a default package so the interpreter can look there for method definitions. Here's a minimal case:
package FrothyMug;
sub new { bless {} }
{} returns a reference to a new anonymous hash, an empty one with no key/value pairs. When {} is bless()ed, it's telling the object it references that it's a FrothyMug and returns the reference whatever has been blessed. The referenced object is aware that it has been blessed.
sub new {
my $self = {};
bless $self;
return $self;
}
You must use the two-argument form of bless if you plan on dealing with inheritance (which you probably will do sooner or later):
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
$self->initialize();
return $self;
}
Remember the package examples we showed before? The function trolling() in a package GoFish can only be called if it's been fully qualified. Other packages must call this function (if they're allowed) with GoFish::trolling(). The scenario is similar here. A package's methods treat the reference as any other reference. Outside the package, the reference should only be accessed through the package's methods
A Class Is Simply a PackageC++ and Java use class declarations; Perl does not. You create a class by putting subroutine definitions and a package declaration into a file. Yes, it's that easy.
The interpreter uses Perl only does method inheritance, that is, interface inheritance. Access to instance data is left to the class. This isn't a problem because most classes' objects use an anonymous hash, which is very much like the grassy areas in the heartland of the United States--the anonymous hash acts like a grassy field where herds of cattle (other classes) come to graze.
A Method Is Simply a SubroutinePerl doesn't use any special syntax for method definition; a method is a subroutine. A method's first argument will be the object or package that invokes it:
Class->meth(); $obj->meth(); $obj_or_clasname->meth(); meth Class; meth $obj; meth $obj_or_clasname;Class and instance methods Class and object methods could be static and instance methods, except that static is a fighting word in the Perl community. Class methods expect the name of the class as the first argument passed to the method. The constructor is an example of a class method. Class methods may simply ignore the first argument because the package of their caller is irrelevant. You can also use a class method to look up an object by name:
sub find_my_object_by_name {
my ($class, $name) = @_;
$objtable{$name};
}
An instance method expects an object as its first argument. Typically it shifts the first argument into a self or this variable, and then uses that as an ordinary reference:
sub display_widget {
my $self = shift;
my @keys = @_ ? @_ : sort keys %$self;
foreach $key (@keys) {
print "\t$key => $self->{$key}\n";
}
}
Method InvocationThere are two ways to invoke a method; we'll cover both of them in this section. Let's say that we have two statements:
$object = method Class "Whatever"; method_2 $object 'Param 1', 'Param 2';We can combine these statements into one with a BLOCK in the indirect object slot:
method_2 { method Class "Whatever" } 'Param 1', 'Param 2';
Those of you who salivate over C++ (or even at perl -e 'print "\007";') will probably like the -> notation that does the same as the above. You'll need to use parentheses if you'll be passing any arguments:
$object = Class->method("Whatever");
$fred->display('Param 1', 'Param 2');
Yes, the parentheses are important. Freedom is nice, but it's not always appropriate to let things hang out, particularly when this causes your program to act unreliably. You should probably avoid coding techniques such as:
$parrot = Bird->noisy("Shh"), 'be', 'quiet';
$parrot->shoot(times => 5), pain => 'likely';
And shamefully I must admit that:
m1 $ob->m2;parses as:
$ob->m1->m2;not as:
$ob->m2->m1; The CPAN ArchitectureThe Comprehensive Perl Archive Network represents the development interests of a cross-section of the Perl community, including Perl utilities, modules, documentation, and (of course) the Perl distribution itself. CPAN was created by Jarkko Hietaniemi and Andreas Koenig.The Perl Resource Kit contains a complete CPAN distribution, so access to the Perl modules discussed in this book is at your fingertips. See the accompanying Perl Utilities Guide for more information on how to install modules from the Perl Resource Kit CD
How Is CPAN Organized?CPAN materials are categorized by Perl modules, distributions, documentation, announcements, ports, scripts, and contributing authors. Each category is linked with related categories. For example, links to a graphing module written by an author appears in both the CPAN modules and author areas.
Most CPAN materials are distributed "tar-gzipped." Since CPAN provides the same offerings worldwide, the directory structure has been standardized so files can be located in the same location in the directory hierarchy at all CPAN sites. All CPAN sites use CPAN as the root directory, from which the user can select a specific Perl item. The CPAN snapshot that appears on your CD-ROM contains the same directory structure, starting with a CPAN directory. From the CPAN directory you have the following choices:
Current directory is CPAN CPAN.html An HTML formatted CPAN info page ENDINGS Describes what the ".tgz" file extensions mean MIRRORED.BY A list of sites mirroring CPAN MIRRORING.FROM A list of sites mirroring CPAN README A brief description of what you'll find on CPAN README.html An HTML formatted version of the README file RECENT Recent additions to the CPAN site RECENT.DAY Recent additions to the CPAN site (daily) RECENT.html An HTML formatted list of recent additions RECENT.WEEK Recent additions to the CPAN site (weekly) ROADMAP What you'll find on CPAN and where ROADMAP.html An HTML formatted version of ROADMAP SITES An exhaustive list of CPAN sites SITES.html An HTML formatted version of SITES authors A list of CPAN authors clpa An archive of comp.lang.perl.announce doc Various Perl documentation, FAQs, etc. indices All that is indexed. latest.tar.gz The latest Perl distribution sources misc Misc Perl stuff like Larry Wall quotes and gifs modules Modules for Perl version 5 other-archives Other things yet uncategorized ports Various Perl ports scripts Various scripts appearing in Perl books src The Perl sources from various versionsThe directory we're most concerned with is modules. It categorizes modules in three ways:
by-author Modules organized by author's registered CPAN name by-category Modules categorized by subject matter (see below) by-module Modules categorized by namespace (i.e., MIME)In CPAN, Perl modules are currently organized into 21 categories. Each category is linked to contributors and related modules. The modules chosen for discussion in this book fit into many of these categories:
02_Perl_Core_Modules 03_Development_Support 04_Operating_System_Interfaces 05_Networking_Devices_Inter_Process 06_Data_Type_Utilities 07_Database_Interfaces 08_User_Interfaces 09_Interfaces_to_Other_Languages 10_File_Names_Systems_Locking 11_String_Processing_Language_Text_Process 12_Option_Argument_Parameter_Processing 13_Internationalization_and_Locale 14_Authentication_Security_Encryption 15_World_Wide_Web_HTML_HTTP_CGI 16_Server_and_Daemon_Utilities 17_Archiving_and_Compression 18_Images_Pixmap_Bitmap_Manipulation 19_Mail_and_Usenet_News 20_Control_Flow_Utilities 21_File_Handle_Input_Output 22_Microsoft_Windows_Modules 23_Miscellaneous_Modules 99_Not_In_ModulelistOnce you've chosen the area from which you'd like to download a module, you should tell your ftp client to request a directory listing for the area. You'll find a list of files in the directory; tar files have a .tar.gz extension and README files have a .readme extension. Here's a sample directory listing from a CPAN site:
ANDK@ CGI.pm-2.35.tar.gz@ CGI-Out-96.081401.readme@ CGI.pm-2.36.readme@ CGI-Out-96.081401.tar.gz@ CGI.pm-2.36.tar.gz@ CGI-Response-0.03.readme@ CGI_Imagemap-1.00.readme@ CGI-Response-0.03.tar.gz@ CGI_Imagemap-1.00.tar.gz@ CGI-modules-2.75.readme@ CGI_Lite-1.62.pm.gz@ CGI-modules-2.75.tar.gz@ DOUGM@ CGI-modules-2.76.readme@ LDS@ CGI-modules-2.76.tar.gz@ MGH@ CGI.pm-2.32.readme@ MIKEH@ CGI.pm-2.33.readme@ MUIR@ CGI.pm-2.34.readme@ SHGUN@ CGI.pm-2.35.readme@ cdrom:/.21/perl/CPAN/modules/by-module/CGI>If your ftp client supports inline viewing of files on an ftp server, select the .readme file of the most current archive and review its contents carefully. README files often give special instructions about building the module; they obtain other modules needed for proper functioning and they inform you if the module can't be built under certain versions of Perl.
How Do I Install the Module?Most system administrators install popular software so that it can be executed globally. When you log in to your account, your system administrator might even announce software installations or upgrades in the login message. Perl modules can also follow this pattern. Since many Perl modules are useful to everyone, the modules are installed so they can be used globally, generally in a branch of the lib directory with the rest of the Perl libraries.If you have root privileges or write access to the locations where Perl modules are installed on your system, you can easily follow these steps when installing most modules:
perl Makefile.PL make make test make installIf you don't have write permission to global areas (e.g., if you have your UNIX account with an ISP), you'll probably have to install your modules locally. You might also install modules locally if you wish to test a module in your home directory before installing for the world at large. To install a module locally, you must pass the PREFIX argument to Perl when generating a Makefile from Makefile.PL. The PREFIX argument tells MakeMaker to use the directory following PREFIX as the base directory when installing the module. For example, to install a module in the directory /home/nvp/Perl/Modules, the PREFIX argument would look like:
perl Makefile.PL PREFIX=/home/nvp/Perl/ModulesThen you would follow the same steps as above:
make make test make installYou now have one more step. Since Perl generally looks in systemwide areas for modules, it won't find local modules unless you tell Perl where to find them. Otherwise, you'll receive an error message like the following:
Can't locate <ModuleName>.pm in @INC. BEGIN failed--compilation aborted.For example, if the module has been installed in /home/nvp/Perl/Modules, you need to tell Perl to look in that location with use lib 'path':
#!/usr/local/bin/perl -w use lib '/home/nvp/Perl/Modules'; use ModuleName; Where Is the Module Documented?Many of the modules you'll be interested in are covered in this book. However, there is also often documentation that is provided by the module author itself, written in a special format called pod. Most of the pod documentation for CPAN modules is printed in the the Perl Module Reference, Volumes 1 and 2."Pod" stands for "plain old documentation." If you are familiar with mark-up languages like HTML, you won't have a difficult time understanding pod. Pod-formatted files contain plain text represented by special tags in a Perl module or script that doesn't require an interpreter to be read by humans. Pod tags are not interpreted when the script is executed; programmers may use pod tags as multiline comments. You'll find several examples of auto-generating pod tags in Chapter 13, Contributing to CPAN.
Pod files are installed into a subdirectory of the Perl lib directory, Pod, which contains the base manpages included with your Perl distribution. You can view these pages by using the
For the nonstandard modules installed on your system, you can also use the
perldoc CGIshows the pod documentation for the CGI.pm module.
Most modules were also distributed with manpages formatted in
How Do I Know What Modules Are Installed on My System?Each time a module is installed globally, information gets appended to perllocal.pod. This file contains the date, the location, the linktype (dynamic versus static), and the version of the module installed, as well as information about any executables installed with the module. You can parse this file using one of the pod-conversion tools previously mentioned. You can also use the CPAN setup tool discussed in Chapter 2 of the Perl Utilities Guide.
[1] Warning: this counterintuitive behavior of defined() on aggregates may be changed, fixed, or broken in a future release of Perl.
[2] ENDs can be circumvented by signals that you have to trap on your own. |