BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Perl in a Nutshell
Perl in a Nutshell By Stephen Spainhour, Ellen Siever, Nathan Patwardhan
December 1998
Pages: 668

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to Perl
Computer languages differ not so much in what they make possible, but in what they make easy. Perl is designed to make the easy jobs easy, without making the hard jobs impossible. Perl makes it easy to manipulate numbers, text, files, directories, computers, networks, and programs. It also makes it easy to develop, modify, and debug your own programs portably, on any modern operating system.
Perl is especially popular with systems programmers and web developers, but it also appeals to a much broader audience. Originally designed for text processing, it has grown into a sophisticated, general-purpose programming language with a rich software development environment complete with debuggers, profilers, cross-referencers, compilers, interpreters, libraries, syntax-directed editors, and all the rest of the trappings of a "real" programming language.
There are many reasons for Perl's success. For starters, Perl is freely available and freely redistributable. But that's not enough to explain the Perl phenomenon, since many other freeware packages fail to thrive. Perl is not just free; it's also fun. People feel like they can be creative in Perl, because they have freedom of expression.
Perl is both a very simple language and a very rich language. It's a simple language in that the types and structures are simple to use and understand, and it borrows heavily from other languages you may already be familiar with. You don't have to know everything there is to know about Perl before you can write useful programs.
However, Perl is also a rich language, and there is much to learn about it. That's the price of making hard things possible. Although it will take some time for you to absorb all that Perl can do, somewhere down the line you will be glad that you have access to the extensive capabilities of Perl.
Perl has the advantage of being easy to learn if you just want to write simple scripts—thus its appeal to the ever-impatient system administrator and the deadline-driven CGI developer. However, as you become more ambitious, Perl lets you act on those ambitions. Chapter 2, covers how to get and install Perl, and Chapter 3, through Chapter 6, cover the basics of the Perl language, its functions, and how to use the Perl debugger.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Perl Good For?
Perl has the advantage of being easy to learn if you just want to write simple scripts—thus its appeal to the ever-impatient system administrator and the deadline-driven CGI developer. However, as you become more ambitious, Perl lets you act on those ambitions. Chapter 2, covers how to get and install Perl, and Chapter 3, through Chapter 6, cover the basics of the Perl language, its functions, and how to use the Perl debugger.
On top of the Perl language itself, however, are the Perl modules. You can think of modules as add-ons to the Perl language that allow you to streamline tasks by providing a consistent API. Perl itself is fun to use, but the modules lend Perl even more flexibility and enormous power. Furthermore, anyone can write and distribute a Perl module. Some modules are deemed important enough or popular enough to be distributed with Perl itself, but very few are actually written by the core Perl developers themselves. Chapter 7, introduces you to Perl modules, and Chapter 8, covers the standard modules that are distributed with Perl itself.
The most popular Perl module is CGI.pm, which gives a simple interface to developing CGI (common gateway interface) applications in Perl. While Perl itself is indispensable for many different tasks, its text-manipulation features make it perfect for CGI development on the Web. In fact, the resurgence of Perl over the past few years must be credited to its popularity as a CGI language. Chapter 10, and Chapter 11, talk about using Perl for CGI, including mod_perl, which merges Perl into the Apache web server.
Database interconnectivity is one of the most important functions of any programming language today, and Perl is no exception. DBI is a suite of modules that provide a consistent database-independent interface for Perl. Chapter 12, covers both DBI and DBM (the more primitive but surprisingly effective database interface built directly into Perl).
The Internet doesn't start and stop at CGI. Network programming is another of Perl's strengths, with a robust sockets interface and several modules for writing clients and servers for all sorts of Internet services—not only the Web, but also email, news, FTP, etc. Chapter 13, through Chapter 17, cover the modules for developing fully functional Internet applications in Perl.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl Development
Software doesn't grow on trees. Perl is free because of the donated efforts of several generous persons who have devoted large chunks of their spare time to the development, maintenance, and evangelism of Perl.
Perl itself was created by Larry Wall, in an effort to produce reports for a bug-reporting system. Larry designed a new scripting language for this purpose, and then released it to the Internet, thinking that someone else might find it useful. In the spirit of freeware, other people suggested improvements and even ways to implement them, and Perl transformed from a cute scripting language into a robust programming language.
Today, Larry does little actual development himself, but he is the ringleader of a core development team known as the Perl Porters. The Porters determine which new features should be added and which pesky bugs should be fixed. To keep it from being a free-for-all, there is generally one person who is responsible for delivering the next release of Perl, with several "development releases" in the interim.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Platforms Support Perl?
While Perl was developed on Unix and is closely entwined with Unix culture, it also has a strong following on the Windows and Macintosh platforms. Perl gives Windows 95, Windows NT, Macintosh, and even VMS users the opportunity to take advantage of the scripting power that Unix users take for granted.
Most Unix machines will have Perl already installed, since it's one of the first things a Unix system administrator will build for a new machine (and is in fact distributed with the operating system on some versions of Unix, such as Linux and FreeBSD). For Windows NT, Windows 95, and Macintosh, there are binary distributions of Perl that you can download for free. See Chapter 2 for information on installing Perl.
Although there is some history of other platforms not being treated seriously by the Perl community, Perl is becoming increasingly friendly to non-Unix platforms. The Win32 ports of Perl are quite stable, and as of Perl 5.005, are integrated wholly with core Perl. MacPerl integration is expected with Perl 5.006.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl Resources
Paradoxically, the way in which Perl helps you the most has almost nothing to do with Perl itself, and everything to do with the people who use Perl. While people start using Perl because they need it, they continue using Perl because they love it.
The result is that the Perl community is one of the most helpful in the world. When Perl programmers aren't writing their own programs, they spend their time helping others write theirs. They discuss common problems and help devise solutions. They develop utilities and modules for Perl, and give them away to the world at large.
The central meeting place for Perl aficionados is Usenet. If you're not familiar with Usenet, it's a collection of special-interest groups (called newsgroups) on the Internet. For most anyone using a modern browser, Usenet access is as simple as a selecting a menu option on the browser. Perl programmers should consider subscribing to the following newsgroups:
comp.lang.perl.announce
A moderated newsgroup with announcements about new utilities or products related to Perl.
comp.lang.perl.misc
The general-purpose newsgroup devoted to non-CGI-related Perl programming questions.
comp.lang.perl.moderated
A moderated newsgroup intended to be a forum for more controlled, restrained discussions about Perl.
comp.lang.perl.modules
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Installing Perl
The best things in life are free. So is Perl. Although you can get a bundled Perl distribution on CD-ROM, most people download Perl from an online archive. CPAN, the Comprehensive Perl Archive Network, is the main distribution point for all things Perl. Whether you are looking for Perl itself, for a module, or for documentation about Perl, CPAN is the place to go, at http://www.perl.com/CPAN/. The ongoing development and enhancement of Perl is very much a cooperative effort, and CPAN is the place where the work of many individuals comes together.
CPAN represents the development interests of a cross-section of the Perl community. It contains Perl utilities, modules, documentation, and (of course) the Perl distribution itself. CPAN was created by Jarkko Hietaniemi and Andreas König.
The home system for CPAN is funet.fi, but CPAN is also mirrored on many other sites around the globe. This ensures that anyone with an Internet connection can have reliable access to CPAN's contents at any time. Since the structure of all CPAN sites is the same, a user searching for the current version of Perl can be sure that the latest.tar.gz file is the same on every site.
The easiest way to access CPAN is to utilize the CPAN multiplex service at www.perl.com. The multiplexor tries to connect you to a local, fast machine on a large bandwidth hub. To use the multiplexor, go to http://www.perl.com/CPAN/; the multiplexor will automatically route you to a site based on your domain.
If you prefer, you can choose a particular CPAN site, instead of letting the multiplexor choose one for you. To do that, go to the URL http://www.perl.com/CPAN (no trailing slash). When you omit the trailing slash, the CPAN multiplexor presents a menu of CPAN mirrors from which you select the one you want. It remembers your choice next time.
If you want to use anonymous FTP, the following machines should have the Perl source code plus a copy of the CPAN mirror list:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The CPAN Architecture
CPAN represents the development interests of a cross-section of the Perl community. It contains Perl utilities, modules, documentation, and (of course) the Perl distribution itself. CPAN was created by Jarkko Hietaniemi and Andreas König.
The home system for CPAN is funet.fi, but CPAN is also mirrored on many other sites around the globe. This ensures that anyone with an Internet connection can have reliable access to CPAN's contents at any time. Since the structure of all CPAN sites is the same, a user searching for the current version of Perl can be sure that the latest.tar.gz file is the same on every site.
The easiest way to access CPAN is to utilize the CPAN multiplex service at www.perl.com. The multiplexor tries to connect you to a local, fast machine on a large bandwidth hub. To use the multiplexor, go to http://www.perl.com/CPAN/; the multiplexor will automatically route you to a site based on your domain.
If you prefer, you can choose a particular CPAN site, instead of letting the multiplexor choose one for you. To do that, go to the URL http://www.perl.com/CPAN (no trailing slash). When you omit the trailing slash, the CPAN multiplexor presents a menu of CPAN mirrors from which you select the one you want. It remembers your choice next time.
If you want to use anonymous FTP, the following machines should have the Perl source code plus a copy of the CPAN mirror list:
ftp.perl.com
ftp.cs.colorado.edu
ftp.cise.ufl.edu
ftp.funet.fi
ftp.cs.ruu.nl
The location of the top directory of the CPAN mirror differs on these machines, so look around once you get there. It's often something like /pub/perl/CPAN.
If you don't have reliable Internet access, you can also get CPAN on CD as part of O'Reilly's
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Is CPAN Organized?
CPAN materials are grouped into categories, including Perl modules, distributions, documentation, announcements, ports, scripts, and contributing authors. Each category is linked to related categories. For example, links to a graphing module written by an author appear in both the module and the author areas.
Since CPAN provides the same offerings worldwide, the directory structure has been standardized; files are located in the same place in the directory hierarchy at all CPAN sites. All CPAN sites use CPAN as the root directory, from which the user can select a specific Perl item.
From the CPAN directory you have the following choices:
CPAN.html       CPAN info page; lists what's available
                in CPAN and describes each of the modules
ENDINGS         Description of the file extensions, such as .tar, .gz, and .zip 
MIRRORED BY     A list of sites mirroring CPAN
MIRRORING.FROM  A list of sites mirrored by CPAN
README          A brief description of what you'll find on CPAN
README.html     An HTML-formatted version of the README file
RECENT          Recent additions to the CPAN site
RECENT.DAY      Recent additions to the CPAN site (daily)
RECENT.html     An HTML-formatted list of recent additions
RECENT.WEEK     Recent additions to the CPAN site (weekly)
ROADMAP         What you'll find on CPAN and where
ROADMAP.html    An HTML-formatted version of ROADMAP
SITES           An exhaustive list of CPAN sites
SITES.html      An HTML-formatted version of SITES
authors         A list of CPAN authors
clpa            An archive of comp.lang.perl.announce
doc             Various Perl documentation, FAQs, etc.
indices         All that is indexed.
latest.tar.gz   The latest Perl distribution sources
misc            Misc Perl stuff like Larry Wall quotes and gifs
modules         Modules for Perl version 5
other-archives  Other things yet uncategorized
ports           Various Perl ports
scripts         Various scripts appearing in Perl books
src             The Perl sources from various versions
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing Perl
Most likely your system administrator is responsible for installing and upgrading Perl. But if you are the system administrator, or you want to install Perl on your own system, sooner or later you will find yourself installing a new version of Perl.
If you have been running Perl, and you are now going to install Perl 5.005, you need to be aware that it is not binary-compatible with older versions. This means that you must rebuild and reinstall any dynamically loaded extensions that you built under earlier versions.
Specific installation instructions come in the README and INSTALL files of the Perl distribution kit. If you don't already have the Perl distribution, you can download it from CPAN—the latest Unix distribution is in latest.tar.gz. The information in this section is an overview of the installation process. The gory details are in the INSTALL file, which you should look at before starting, especially if you haven't done an installation before. Note that operating systems other than Unix may have special instructions; if so, follow those instructions instead of what's in this section or in INSTALL. Look for a file named README.xxx, where xxx is your OS name.
In addition to Perl itself, the standard distribution includes a set of core modules that are automatically installed with Perl. See Section 2.4 later in this chapter for how to install modules that are not bundled with Perl; Chapter 8, describes the standard modules in some detail.
Typically, you'll get the Perl kit packed as either a tar file or as a set of shar (shell archive) scripts; in either case, the file will be in a compressed format. If you got your version of Perl directly from CPAN, it is probably in "tar-gzipped" format; tar and gzip are popular Unix data-archiving formats. In any case, once you've downloaded the distribution, you need to uncompress and unpack it. The filename indicates what kind of compression was used. A
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting and Installing Modules
As you'll see when you look at the lists of modules and their authors on CPAN, many users have made their modules freely available. If you find an interesting problem and are thinking of writing a module to solve it, check the modules directory on CPAN first to see if there is a module there that you can use. The chances are good that there is either a module already that does what you need, or perhaps one that you can extend, rather than starting from scratch.
Before you download a module, you might also check your system to see if it's already installed. The following command searches the libraries in the @INC array and prints the names of all modules it finds:
find `perl -e 'print "@INC"'` -name '*.pm' -print
If you start from the modules directory on CPAN, you'll see that the modules are categorized into three subdirectories:
by-authors       Modules by author's registered CPAN name
by-category      Modules by subject matter (see below)
by-module        Modules by namespace (i.e., MIME)
               
If you know what module you want, you can go directly to it by clicking on the by-module entry. If you are looking for a module in a particular category, you can find it through the by-category subdirectory. If you know the author, click on by-author. However, if you aren't familiar with the categories and you want to find out if there is a module that performs a certain task, you might want to get the file 00modlist.long.html, also in the modules directory. That file is the "Perl 5 Modules List." It contains a list of all the modules, by category, with a brief description of the purpose of each module and a link to the author's CPAN directory for downloading.
Here is a list of the categories; there are currently 22 categories, plus one for modules that don't fit anywhere else:
02_Perl_Core_Modules
03_Development_Support
04_Operating_System_Interfaces
05_Networking_Devices_Inter_Process
06_Data_Type_Utilities
07_Database_Interfaces
08_User_Interfaces   
09_Interfaces_to_Other_Languages
10_File_Names_Systems_Locking  
11_String_Processing_Language_Text_Process
12_Option_Argument_Parameter_Processing
13_Internationalization_and_Locale     
14_Authentication_Security_Encryption  
15_World_Wide_Web_HTML_HTTP_CGI
16_Server_and_Daemon_Utilities
17_Archiving_and_Compression  
18_Images_Pixmap_Bitmap_Manipulation
19_Mail_and_Usenet_News
20_Control_Flow_Utilities
21_File_Handle_Input_Output
22_Microsoft_Windows_Modules
23_Miscellaneous_Modules
99_Not_In_Modulelist
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Documentation
Perl documentation is written in a language known as pod (plain old documentation). Pod is a set of simple tags that can be processed to produce documentation in the style of Unix manpages. There are also several utility programs available that process pod text and generate output in different formats. Pod tags can be intermixed with Perl commands, or they can be saved in a separate file, which usually has a .pod extension. The pod tags and the utility programs that are included in the Perl distribution are described in Chapter 4.
On Unix, the standard Perl installation procedure generates manpages for the Perl documentation from their pod format, although your system administrator might also choose to install the documentation as HTML files. You can also use this procedure to generate manpages for CPAN modules when you install them. You might need to modify your MANPATH environment variable to include the path to the Perl manpages, but then you should be able to read the documentation with the man command. In addition, Perl comes with its own command, perldoc, which formats the pod documentation and displays it. perldoc is particularly useful for reading module documentation, which might not be installed as manpages; you can also use it for reading the core Perl documentation.
The ActiveState Win32 port comes with documentation in HTML format; you can find it in the /docs subdirectory of the distribution. Documentation specific to ActiveState's Perl for Win32 is installed in the /docs/Perl-Win32 subdirectory.
The native Win32 port installs the perldoc command for formatting and reading Perl documentation; it also provides an option during installation for the documentation to be formatted and saved as HTML files.
Perl comes with lots of online documentation. To make life easier, the manpages have been divided into separate sections so you don't have to wade through hundreds of pages of text to find what you are looking for. You can read them with either the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: The Perl Interpreter
The perl executable, normally installed in /usr/bin or /usr/local/bin on your machine, is also called the perl interpreter. Every Perl program must be passed through the Perl interpreter in order to execute. The first line in many Perl programs is something like:
#!/usr/bin/perl
For Unix systems, this #! (hash-bang or shebang) line tells the shell to look for the /usr/bin/perl program and pass the rest of the file to that program for execution. Sometimes you'll see different pathnames to the Perl executable, such as /usr/local/bin/perl. You might see perl5 instead of perl on sites that still depend on older versions of Perl. Or you'll see command-line options tacked on the end, such as the notorious -w switch, which produces warning messages. But almost all Perl programs on Unix start with some variation of this line.
If you get a mysterious "Command not found" error on a Perl program, it's often because the path to the Perl executable is wrong. When you download Perl programs off the Internet, copy them from one machine to another, or copy them out of a book (like this one!), the first thing you should do is make sure that the #! line points to the location of the Perl interpreter on your system.
So what does the Perl interpreter do? It compiles the program internally into a parse tree and then executes it immediately. Perl is commonly known as an interpreted language, but this is not strictly true. Since the interpreter actually does convert the program into byte code before executing it, it is sometimes called an interpreter/compiler, if anything at all. Although the compiled form is not stored as a file, release 5.005 of Perl includes a working version of a standalone Perl compiler.
What does all this brouhaha mean for you? When you write a Perl program, you can just give it a correct #! line at the top of the script, make it executable with chmod +x, and run it. For 95% of Perl programmers in this world, that's all you'll care about.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command Processing
In addition to specifying a #! line, you can also specify a short script directly on the command line. Here are some of the possible ways to run Perl:
  • Issue the perl command, writing your script line by line via -e switches on the command line:
    perl -e 'print "Hello, world\n"'    #Unix
    perl -e "print \"Hello, world\n\""  #Win32
  • Issue the perl command, passing Perl the name of your script as the first parameter (after any switches):
    perl testpgm
  • On Unix systems that support the #! notation, specify the Perl command on the #! line, make your script executable, and invoke it from the shell (as described above).
  • Pass your script to Perl via standard input. For example, under Unix:
    echo "print 'Hello, world'" | perl -
    or (unless ignoreeof is set):
    % perl
    print "Hello, world\n";
    ^D
  • On Win32 systems, you can associate an extension (e.g., .plx) with a file type and double-click on the icon for a Perl script with that file type. If you are using the ActiveState version of Win32 Perl, the installation script normally prompts you to create the association.
  • On Win32 systems, if you double-click on the icon for the Perl executable, you'll find yourself in a command-prompt window, with a blinking cursor. You can enter your Perl commands, indicating the end of your input with CTRL-Z, and Perl will compile and execute your script.
Perl parses the input file from the beginning, unless you've specified the -x switch (see Section 3.2 later in this chapter). If there is a #! line, it is always examined for switches as the line is being parsed. Thus, switches behave consistently regardless of how Perl was invoked.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command-Line Options
Perl expects any command-line options, also known as switches or flags, to come first on the command line. The next item is usually the name of the script, followed by any additional arguments (often filenames) to be passed into the script. Some of these additional arguments may be switches, but if so, they must be processed by the script, since Perl gives up parsing switches as soon as it sees either a non-switch item or the special -- switch that terminates switch processing.
A single-character switch with no argument may be combined (bundled) with the switch that follows it, if any. For example:
#!/usr/bin/perl -spi.bak
is the same as:
#!/usr/bin/perl -s -p -i.bak
Perl recognizes the switches listed in Table 3.1.
Table 3.1: Perl Switches
SwitchFunction
--
Terminates switch processing, even if the next argument starts with a minus. It has no other effect.
-0[octnum]
Specifies the record separator ($/) as an octal number. If octnum is not present, the null character is the separator. Other switches may precede or follow the octal number.
-a
Turns on autosplit mode when used with -n or -p. An implicit split of the @F array is inserted as the first command inside the implicit while loop produced by -n or -p. The default field delimiter is whitespace; a different field delimiter may be specified using -F.
-c
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Environment Variables
Environment variables are used to set user preferences. Individual Perl modules or programs are always free to define their own environment variables, and there is also a set of special environment variables that are used in the CGI environment (see Chapter 9).
Perl uses the following environment variables:
HOME
Used if chdir has no argument.
LOGDIR
Used if chdir has no argument and HOME is not set.
PATH
Used in executing subprocesses and in finding the script if -S is used.
PATHEXT
On Win32 systems, if you want to avoid typing the extension every time you execute a Perl script, you can set the PATHEXT environment variable so that it includes Perl scripts. For example:
> set PATHEXT=%PATHEXT%;.PLX
This setting lets you type:
> myscript
without including the file extension. Take care when setting PATHEXT permanently—it also includes executable file types like .com, .exe, .bat, and .cmd. If you inadvertently lose those extensions, you'll have difficulty invoking applications and script files.
PERL5LIB
A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is not defined, PERLLIB is used. When running taint checks, neither variable is used. The script should instead say:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Perl Compiler
A native-code compiler for Perl is now (as of Perl 5.005) part of the standard Perl distribution. The compiler allows you to distribute Perl programs in binary form, which enables easy packaging of Perl-based programs without having to depend on the source machine having the correct version of Perl and the correct modules installed. After the initial compilation, running a compiled program should be faster to the extent that it doesn't have to be recompiled each time it's run. However, you shouldn't expect that the compiled code itself will run faster than the original Perl source or that the executable will be smaller—in reality, the executable file is likely to be significantly bigger.
This initial release of the compiler is still considered to be a beta version. It's distributed as an extension module, B, that comes with the following backends:
Bytecode
Translates a script into platform-independent Perl byte code.
C
Translates a Perl script into C code.
CC
Translates a Perl script into optimized C code.
Deparse
Regenerates Perl source code from a compiled program.
Lint
Extends the Perl -w option. Named after the Unix Lint program-checker.
Showlex
Shows lexical variables used in functions or files.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Threads
Perl 5.005 also includes the first release of a native multithreading capability, which is distributed with Perl as a set of modules. Since this is an initial release, the threads modules are considered to be beta software and aren't automatically compiled in with Perl. Therefore, the decision to use the threads feature has to be made during installation, so it can be included in the build of Perl. Or you might want to build a separate version of Perl for testing purposes.
Chapter 8 describes the individual Thread modules. For information on what threads are and how you might use them, see the article "Threads" in the Summer 1998 issue of The Perl Journal. There is also an explanation of threads in the book Programming with Perl Modules from O'Reilly's Perl Resource Kit, Win32 Edition.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: The Perl Language
This chapter is a quick and merciless guide to the Perl language itself. If you're trying to learn Perl from scratch, and you'd prefer to be taught rather than to have things thrown at you, then you might be better off with Learning Perl by Randal Schwartz and Tom Christiansen, or Learning Perl on Win32 Systems by Randal Schwartz, Erik Olson, and Tom Christiansen. However, if you already know some other programming languages and just want to hear the particulars of Perl, this chapter is for you. Sit tight, and forgive us for being terse: we have a lot of ground to cover.
If you want a more complete discussion of the Perl language and its idiosyncrasies (and we mean complete), see Programming Perl by Larry Wall, Tom Christiansen, and Randal Schwartz.
Perl is a particularly forgiving language, as far as program layout goes. There are no rules about indentation, newlines, etc. Most lines end with semicolons, but not everything has to. Most things don't have to be declared, except for a couple of things that do. Here are the bare essentials:
Whitespace
Whitespace is required only between items that would otherwise be confused as a single term. All types of whitespace—spaces, tabs, newlines, etc.—are equivalent in this context. A comment counts as whitespace. Different types of whitespace are distinguishable within quoted strings, formats, and certain line-oriented forms of quoting. For example, in a quoted string, a newline, a space, and a tab are interpreted as unique characters.
Semicolons
Every simple statement must end with a semicolon. Compound statements contain brace-delimited blocks of other statements and do not require terminating semicolons after the ending brace. A final simple statement in a block also does not require a semicolon.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Program Structure
Perl is a particularly forgiving language, as far as program layout goes. There are no rules about indentation, newlines, etc. Most lines end with semicolons, but not everything has to. Most things don't have to be declared, except for a couple of things that do. Here are the bare essentials:
Whitespace
Whitespace is required only between items that would otherwise be confused as a single term. All types of whitespace—spaces, tabs, newlines, etc.—are equivalent in this context. A comment counts as whitespace. Different types of whitespace are distinguishable within quoted strings, formats, and certain line-oriented forms of quoting. For example, in a quoted string, a newline, a space, and a tab are interpreted as unique characters.
Semicolons
Every simple statement must end with a semicolon. Compound statements contain brace-delimited blocks of other statements and do not require terminating semicolons after the ending brace. A final simple statement in a block also does not require a semicolon.
Declarations
Only subroutines and report formats need to be explicitly declared. All other user-created objects are automatically created with a null or 0 value unless they are defined by some explicit operation such as assignment. The -w command-line switch will warn you about using undefined values.
You may force yourself to declare your variables by including the use strict pragma in your programs (see Chapter 8, for more information on pragmas and strict in particular). This makes it an error to not explicitly declare your variables.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Data Types and Variables
Perl has three basic data types: scalars, arrays, and hashes.
Scalars are essentially simple variables. They are preceded by a dollar sign ($). A scalar is either a number, a string, or a reference. (A reference is a scalar that points to another piece of data. References are discussed later in this chapter.) If you provide a string where a number is expected or vice versa, Perl automatically converts the operand using fairly intuitive rules.
Arrays are ordered lists of scalars that you access with a numeric subscript (subscripts start at 0). They are preceded by an "at" sign (@).
Hashes are unordered sets of key/value pairs that you access using the keys as subscripts. They are preceded by a percent sign (%).
Perl stores numbers internally as either signed integers or double-precision floating-point values. Numeric literals are specified in any of the following floating-point or integer formats:
12345               # integer
-54321              # negative integer
12345.67            # floating point
6.02E23             # scientific notation
0xffff              # hexadecimal
0377                # octal
4_294_967_296       # underline for legibility
Since Perl uses the comma as a list separator, you cannot use a comma for improving legibility of a large number. To improve legibility, Perl allows you to use an underscore character instead. The underscore only works within literal numbers specified in your program, not in strings functioning as numbers or in data read from somewhere else. Similarly, the leading 0x for hex and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes—you must do an explicit conversion.
Strings are sequences of characters. String literals are usually delimited by either single (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Statements
A simple statement is an expression evaluated for its side effects. Every simple statement must end in a semicolon, unless it is the final statement in a block.
A sequence of statements that defines a scope is called a block. Generally, a block is delimited by braces, or { }. Compound statements are built out of expressions and blocks. A conditional expression is evaluated to determine whether a statement block will be executed. Compound statements are defined in terms of blocks, not statements, which means that braces are required.
Any block can be given a label. Labels are identifiers that follow the variable-naming rules (i.e., they begin with a letter or underscore, and can contain alphanumerics and underscores). They are placed just before the block and are followed by a colon, like SOMELABEL here:
SOMELABEL: {
  ...statements...
  }
By convention, labels are all uppercase, so as not to conflict with reserved words. Labels are used with the loop-control commands next, last, and redo to alter the flow of execution in your programs.
The if and unless statements execute blocks of code depending on whether a condition is met. These statements take the following forms:
if (expression) {block} else {block}

unless (expression) {block} else {block}

if (expression1) {block}
elsif (expression2) {block}
  ...
elsif (lastexpression) {block}
else {block}

Section 4.3.1.1: while loops

The while statement repeatedly executes a block as long as its conditional expression is true. For example:
while (<INFILE>) {
    print OUTFILE, "$_\n";
}
This loop reads each line from the file opened with the filehandle INFILE and prints them to the OUTFILE filehandle. The loop will cease when it encounters an end-of-file.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Special Variables
Some variables have a predefined and special meaning in Perl. They are the variables that use punctuation characters after the usual variable indicator ($, @, or %), such as $_. The explicit, long-form names shown are the variables' equivalents when you use the English module by including "use English;" at the top of your program.
The most commonly used special variable is $_, which contains the default input and pattern-searching string. For example, in the following lines:
foreach ('hickory','dickory','doc') {
	print;
}
The first time the loop is executed, "hickory" is printed. The second time around, "dickory" is printed, and the third time, "doc" is printed. That's because in each iteration of the loop, the current string is placed in $_, and is used by default by print. Here are the places where Perl will assume $_ even if you don't specify it:
  • Various unary functions, including functions like ord and int, as well as the all file tests (-f, -d) except for -t, which defaults to STDIN.
  • Various list functions like print and unlink.
  • The pattern-matching operations m//, s///, and tr/// when used without an =~ operator.
  • The default iterator variable in a foreach loop if no other variable is supplied.
  • The implicit iterator variable in the grep and map functions.
  • The default place to put an input record when a line-input operation's result is tested by itself as the sole criterion of a while test (i.e., <filehandle>). Note that outside of a while test, this will not happen.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Operators
Table 4.3 lists all the Perl operators from highest to lowest precedence and indicates their associativity.
Table 4.3: Perl Associativity and Operators, Listed by Precedence
AssociativityOperators
LeftTerms and list operators (leftward)
Left -> (method call, dereference)
Nonassociative ++ -- (autoincrement, autodecrement)
Right ** (exponentiation)
Right
! ~ \ and unary + and - (logical not, bit-not, reference, unary plus, unary minus)
Left
=~ !~ (matches, doesn't match)
Left
* / % x (multiply, divide, modulus, string replicate)
Left
+ - . (addition, subtraction, string concatenation)
Left
<< >> (left bit-shift, right bit-shift)
Nonassociative
Named unary operators and file-test operators
Nonassociative
< > <= >= lt gt le ge (less than, greater than, less than or equal to, greater than or equal to, and their string equivalents.
Nonassociative
== != <=> eq ne cmp (equal to, not equal to, signed comparison, and their string equivalents)
Left & (bit-and)
Left | ^ (bit-or, bit-xor)
Left && (logical AND)
Left || (logical OR)
Nonassociative .. &#133;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Regular Expressions
Regular expressions are used several ways in Perl. They're used in conditionals to determine whether a string matches a particular pattern. They're also used to find patterns in strings and replace the match with something else.
The ordinary pattern match operator looks like / pattern /. It matches against the $_ variable by default. If the pattern is found in the string, the operator returns true ("1"); if there is no match, a false value ("") is returned.
The substitution operator looks like s/ pattern / replace /. This operator searches $_ by default. If it finds the specified pattern, it is replaced with the string in replace. If pattern is not matched, nothing happens.
You may specify a variable other than $_ with the =~ binding operator (or the negated !~ binding operator, which returns true if the pattern is not matched). For example:
$text =~ /sampo/;
The following list defines Perl's pattern-matching operators. Some of the operators have alternative "quoting" schemes and have a set of modifiers that can be placed directly after the operators to affect the match operation in some way.
m/ pattern /gimosx
Searches a string for a pattern match. Modifiers are:
ModifierMeaning
g
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Subroutines
Subroutines are declared using one of these forms:
sub name {block}
sub name (proto) {block}
Prototypes allow you to put constraints on the arguments you provide to your subroutines.
You can also create anonymous subroutines at run-time, which will be available for use through a reference:
$subref = sub {block};
The ampersand (&) is the identifier used to call subroutines. Most of the time, however, subroutines can be used in an expression just like built-in functions. To call subroutines directly:
                     name(args);                  # & is optional with parentheses
name 
                     args;                   # Parens optional if predeclared/imported
&name;                       # Passes current @_ to subroutine
To call subroutines indirectly (by name or by reference):
&$subref(args);                 # & is not optional on indirect call
&$subref;                       # Passes current @_ to subroutine
All arguments to a subroutine are passed as a single, flat list of scalars, and return values are returned the same way. Any arrays or hashes passed in these lists will have their values interpolated into the flattened list.
Any arguments passed to a subroutine come in as the array @_.
You may use the explicit return statement to return a value and leave the subroutine at any point.
If you want to pass more than one array or hash into or out of a function and have them maintain their integrity, then you will want to pass references as arguments. The simplest way to do this is to take your named variables and put a backslash in front of them in the argument list:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
References and Complex Data Structures
A Perl reference is a fundamental data type that "points" to another piece of data or code. A reference knows the location of the information and what type of data is stored there.
A reference is a scalar and can be used anywhere a scalar can be used. Any array element or hash value can contain a reference (a hash key cannot contain a reference), and this is how nested data structures are built in Perl. You can construct lists containing references to other lists, which can contain references to hashes, and so on.
You can create a reference to an existing variable or subroutine by prefixing it with a backslash:
$a = "fondue";
@alist = ("pitt", "hanks", "cage", "cruise");
%song = ("mother" => "crying", "brother" => "dying");
sub freaky_friday { s/mother/daughter/ }
# Create references
$ra = \$a;
$ralist = \@alist;
$rsong = \%song;
$rsub = \&freaky_friday; # '&' required for subroutine names
References to scalar constants are created similarly:
$pi = \3.14159;
$myname = \"Charlie";
Note that all references are prefixed by a $, even if they refer to an array or hash. All references are scalars, thus you can copy a reference to another scalar or even reference another reference:
$aref = \@names;
$bref = $aref;  # both refer to @names
$cref = \$aref; # $cref is a reference to $aref
Because arrays and hashes are collections of scalars, you can create references to individual elements by prefixing their names with backslashes:
$star = \$alist[2];       # refers to third element of @alist
$action = \$song{mother}; # refers to the 'mother' value of %song

Section 4.8.1.1: Referencing anonymous data

It is also possible to take references to literal data not stored in a variable. This data is called anonymous because it is not bound to any named variable.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filehandles
Content preview·