|
|
|
|
Learning Perl, 2nd EditionBy Randal L. Schwartz & Tom Christiansen2nd Edition July 1997 1-56592-284-0, Order Number: 2840 302 pages, $34.95 |
Sample Chapter 1:
Introduction
1.1 History of Perl
Perl is short for "P ractical E xtraction and R eport L anguage," although it has also been called a "P athologically E clectic R ubbish L ister." There's no point in arguing which one is more correct, because both are endorsed by Larry Wall, Perl's creator and chief architect, implementor, and maintainer. He created Perl when he was trying to produce some reports from a Usenet-news-like hierarchy of files for a bug-reporting system, and awk ran out of steam. Larry, being the lazy programmer that he is, decided to over-kill the problem with a general-purpose tool that he could use in at least one other place. The result was the first version of Perl.
After playing with this version of Perl a bit, adding stuff here and there, Larry released it to the community of Usenet readers, commonly known as "the Net." The users on this ragtag fugitive fleet of systems around the world (tens of thousands of them) gave him feedback, asking for ways to do this, that, or the other, many of which Larry had never envisioned his little Perl handling.
But as a result, Perl grew, and grew, and grew, at about the same rate as the UNIX operating system. (For you newcomers, the entire UNIX kernel used to fit in 32K! And now we're lucky if we can get it in under a few meg.) It grew in features. It grew in portability. What was once a little language now had over a thousand pages of documentation split across dozens of different manpages, a 600-page Nutshell reference book, a handful of Usenet newsgroups with 200,000 subscribers, and now this gentle introduction.
Larry is no longer the sole maintainer of Perl, but retains his executive title of chief architect. And Perl is still growing.
This book was tested with Perl version 5.0 patchlevel 4 (the most recent release as I write this). Everything here should work with 5.0 and future releases of Perl. In fact, Perl 1.0 programs work rather well with recent releases, except for a few odd changes made necessary in the name of progress.
1.2 Purpose of Perl
Perl is designed to assist the programmer with common tasks that are probably too heavy or too portability-sensitive for the shell, and yet too weird or short-lived or complicated to code in C or some other UNIX glue language.
Once you become familiar with Perl, you may find yourself spending less time trying to get shell quoting (or C declarations) right, and more time reading Usenet news and downhill snowboarding, because Perl is a great tool for leverage. Perl's powerful constructs allow you to create (with minimal fuss) some very cool one-up solutions or general tools. Also, you can drag those tools along to your next job, because Perl is highly portable and readily available, so you'll have even more time there to read Usenet news and annoy your friends at karaoke bars.
Like any language, Perl can be "write-only"; it's possible to write programs that are impossible to read. But with proper care, you can avoid this common accusation. Yes, sometimes Perl looks like line noise to the uninitiated, but to the seasoned Perl programmer, it looks like checksummed line noise with a mission in life. If you follow the guidelines of this book, your programs should be easy to read and easy to maintain, but they probably won't win any obfuscated Perl contests.
1.3 Availability
perl: not foundwhen you try to invoke Perl from the shell, your system administrator hasn't caught the fever yet. But even if it's not on your system, you can get it for free (or nearly so).
Perl is distributed under the GNU Public License,[1] which says something like, "you can distribute binaries of Perl only if you make the source code available at no cost, and if you modify Perl, you have to distribute the source to your modifications as well." And that's essentially free. You can get the source to Perl for the cost of a blank tape or a few megabytes over a wire. And no one can lock Perl up and sell you just binaries for their particular idea of "supported hardware configurations."
[1] Or the slightly more liberal Artistic License, found in the distribution sources.
In fact, it's not only free, but it runs rather nicely on nearly everything that calls itself UNIX or UNIX-like and has a C compiler. This is because the package comes with an arcane configuration script called Configure that pokes and prods the system directories looking for things it requires, and adjusts the include files and defined symbols accordingly, turning to you for verification of its findings.
Besides UNIX or UNIX-like systems, people have also been addicted enough to Perl to port it to the Amiga, the Atari ST, the Macintosh family, VMS, OS/2, even MS/DOS and Windows NT and Windows 95 - and probably even more by the time you read this. The sources for Perl (and many precompiled binaries for non-UNIX architectures) are available from the Comprehensive Perl Archive Network (the CPAN). If you are web-savvy, visit http://www.perl.com/CPAN for one of the many mirrors. If you're absolutely stumped, write bookquestions@oreilly.com and say "Where can I get Perl?!?!"
1.4 Basic Concepts
A shell script is nothing more than a sequence of shell commands stuffed into a text file. The file is then "made executable" by turning on the execute bit (via chmod +x filename) and then the name of the file is typed at a shell prompt. Bingo, one shell program. For example, a script to run the date command followed by the who command can be created and executed like this:
%echo date >somescript%echo who >>somescript%cat somescriptdate who %chmod +x somescript%somescript[output of date followed by who] %Similarly, a Perl program is a bunch of Perl statements and definitions thrown into a file. You then turn on the execute bit[2] and type the name of the file at a shell prompt. However, the file has to indicate that this is a Perl program and not a shell program, so you need an additional step.
[2] On UNIX systems, that is. For directions on how to render your scripts executable on non-UNIX systems, see the Perl FAQ or your port's release notes.
Most of the time, this step involves placing the line
#!/usr/bin/perlas the first line of the file. But if your Perl is stuck in some nonstandard place, or your system doesn't understand the
#!line, you'll have a little more work to do. Check with your Perl installer about this. The examples in this book assume that you use this common mechanism.Perl is mostly a free-format language like C - whitespace between tokens (elements of the program, like
+) is optional, unless two tokens put together can be mistaken for another token, in which case whitespace of some kind is mandatory. (Whitespace consists of spaces, tabs, newlines, returns, or formfeeds.) There are a few constructs that require a certain kind of whitespace in a certain place, but they'll be pointed out when we get to them. You can assume that the kind and amount of whitespace between tokens is otherwise arbitrary.Although nearly any Perl program can be written all on one line, typically a Perl program is indented much like a C program, with nested parts of statements indented more than the surrounding parts. You'll see plenty of examples showing a typical indentation style throughout this book.
Just like a shell script, a Perl program consists of all of the Perl statements of the file taken collectively as one big routine to execute. There's no concept of a "main" routine as in C.
Perl comments are like (modern) shell comments. Anything from an unquoted pound sign (
#) to the end of the line is a comment. There are no C-like multiline comments.Unlike most shells (but like awk and sed ), the Perl interpreter completely parses and compiles the program into an internal format before executing any of it. This means that you can never get a syntax error from the program once the program has started, and that the whitespace and comments simply disappear and won't slow the program down. This compilation phase ensures the rapid execution of Perl operations once it is started, and it provides additional motivation for dropping C as a systems utility language merely on the grounds that C is compiled.
This compilation does take time; it's inefficient to have a voluminous Perl program that does one small quick task (out of many potential tasks) and then exits, because the run-time for the program will be dwarfed by the compile-time.
So Perl is like a compiler and an interpreter. It's a compiler because the program is completely read and parsed before the first statement is executed. It's an interpreter because there is no object code sitting around filling up disk space. In some ways, it's the best of both worlds. Admittedly, a caching of the compiled object code between invocations, or even translation into native machine code, would be nice. Actually, a working version of such a compiler already exists and is currently scheduled to be bundled into the 5.005 release. See the Perl FAQ for current status.
1.5 A Stroll Through Perl
We begin our journey through Perl by taking a little stroll. This stroll presents a number of different features by hacking on a small application. The explanations here are extremely brief; each subject area is discussed in much greater detail later in this book. But this little stroll should give you a quick taste for the language, and you can decide if you really want to finish this book rather than read some more Usenet news or run off to the ski slopes.
1.5.1 The "Hello, World" Program
Let's look at a little program that actually does something. Here is your basic "Hello, world" program:
#!/usr/bin/perl -w print ("Hello, world!\n");The first line is the incantation that says this is a Perl program. It's also a comment for Perl; remember that a comment is anything from a pound sign to the end of that line, as in many interpreter programming languages. Unlike all other comments in the program, the one on the first line is special: Perl looks at that line for any optional arguments. In this case, the -w switch was used. This very important switch tells Perl to produce extra warning messages about potentially dangerous constructs. You should always develop your programs under -w.
The second line is the entire executable part of this program. Here we see a
\nstands for a newline character. The;). As in C, all simple statements in Perl are terminated by a semicolon.[3][3] The semicolon can be omitted when the statement is the last statement of a block or file or
eval.When you invoke this program, the kernel fires up a Perl interpreter, which parses the entire program (all two lines of it, counting the first, comment line) and then executes the compiled form. The first and only operation is the execution of the
Soon you'll see Perl programs where
1.5.2 Asking Questions and Remembering the Result
Let's add a bit more sophistication. The
Hello,worldgreeting is a touch cold and inflexible. Let's have the program call you by your name. To do this, we need a place to hold the name, a way to ask for the name, and a way to get a response.One kind of place to hold values (like a name) is a scalar variable. For this program, we'll use the scalar variable
$nameto hold your name. We'll go into more detail in Chapter 2, Scalar Data, about what these variables can hold, and what you can do with them. For now, assume that you can hold a single number or string (sequence of characters) in a scalar variable.The program needs to ask for the name. To do that, we need a way to prompt and a way to accept input. The previous program showed us how to prompt: use the
<STDIN>construct, which (as we're using it here) grabs one line of input. We assign this input to the$namevariable. This gives us the program:print "What is your name? "; $name = <STDIN>;The value of
$nameat this point has a terminating newline (Randalcomes in asRandal\n). To get rid of that, we use thechompfunction, which takes a scalar variable as its sole argument and removes the trailing newline (record separator), if present, from the string value of the variable:chomp ($name);Now all we need to do is say
Hello,followed by the value of the$namevariable, which we can do in a shell-like fashion by embedding the variable inside the quoted string:print "Hello, $name!\n";As with the shell, if we want a dollar sign rather than a scalar variable reference, we can precede the dollar sign with a backslash.
Putting it all together, we get:
#!/usr/bin/perl -w print "What is your name? "; $name = <STDIN>; chomp ($name); print "Hello, $name!\n";1.5.3 Adding Choices
Now, let's say we have a special greeting for Randal, but want an ordinary greeting for anyone else. To do this, we need to compare the name that was entered with the string
Randal, and if it's the same, do something special. Let's add a C-like if-then-else branch and a comparison to the program:#!/usr/bin/perl print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name eq "Randal") { print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting }The
eqoperator compares two strings. If they are equal (character-for-character, and have the same length), the result is true. (There's no comparable operator[4] in C or C++.)[4] Well, OK, there's a standard
libcsubroutine. But that's not an operator.The
ifstatement selects which block of statements (between matching curly braces) is executed; if the expression is true, it's the first block, otherwise it's the second block.1.5.4 Guessing the Secret Word
Well, now that we have the name, let's have the person running the program guess a secret word. For everyone except Randal, we'll have the program repeatedly ask for guesses until the person guesses properly. First the program, and then an explanation:
#!/usr/bin/perl -w $secretword = "llama"; # the secret word print "What is your name? "; $name = <STDIN>; chomp $name; if ($name eq "Randal") { print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }First, we define the secret word by putting it into another scalar variable,
$secretword. After the greeting the (non-Randal) person is asked (with anotherneoperator, which returns true if the strings are not equal (this is the logical opposite of theeqoperator). The result of the comparison controls awhileloop, which executes the block as long as the comparison is true.Of course, this is not a very secure program, because anyone who is tired of guessing can merely interrupt the program and get back to the prompt, or even look at the source to determine the word. But, we weren't trying to write a security system, just an example for this section.
1.5.5 More than One Secret Word
Let's see how we can modify this to allow more than one valid secret word. Using what we've already seen, we could compare the guess repeatedly against a series of good answers stored in separate scalar variables. However, such a list would be hard to modify or read in from a file or compute based on the day of the week.
A better solution is to store all possible answers in a data structure called a list, or (preferably) an array. Each element of the array is a separate scalar variable that can be independently set or accessed. The entire array can also be given a value in one fell swoop. We can assign a value to the entire array named
@wordsso that it contains three possible good passwords:@words = ("camel","llama","alpaca");Array variable names begin with
@, so they are distinct from scalar variable names. Another way to write this so that we don't have to put all those quote marks there is with theqw()operator, like so:@words = qw(camel llama alpaca);These mean exactly the same thing; the
qwmakes it as if we had quoted each of three strings.Once the array is assigned, we can access each element using a subscript reference. So
$words[0]iscamel,$words[1]isllama, and$words[2]isalpaca. The subscript can be an expression as well, so if we set$ito 2, then$words[$i]isalpaca. (Subscript references start with$rather than@because they refer to a single element of the array rather than the whole array.) Going back to our previous example:#!/usr/bin/perl -w @words = qw(camel llama alpaca); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name eq "Randal") { print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); $i = 0; # try this word first $correct = "maybe"; # is the guess correct or not? while ($correct eq "maybe") { # keep checking til we know if ($words[$i] eq $guess) { # right? $correct = "yes"; # yes! } elsif ($i < 2) { # more words to look at? $i = $i + 1; # look at the next word next time } else { # no more words, must be bad print "Wrong, try again. What is the secret word?"; $guess = <STDIN>; chomp ($guess); $i = 0; # start checking at the first word again } } # end of while not correct } # end of "not Randal"You'll notice we're using the scalar variable
$correctto indicate that we are either still looking for a good password or that we've found one.This program also shows the
elsifblock of theif-then-elsestatement. This exact construct is not present in all programming languages; it's an abbreviation of theelseblock together with a newifcondition, but without nesting inside yet another pair of curly braces. It's a very Perl-like thing to compare a set of conditions in a cascadedif-elsif-elsif-elsif-elsechain. Perl doesn't really have the equivalent of C's "switch" or Pascal's "case" statement, although you can build one yourself without too much trouble. See Chapter 2 of Programming Perl or the perlsyn (1) manpage for details.1.5.6 Giving Each Person a Different Secret Word
In the previous program, any person who comes along could guess any of the three words and be successful. If we want the secret word to be different for each person, we'll need a table that matches up people with words:
Person
Secret Word
Fred
camel
Barney
llama
Betty
alpaca
Wilma
alpaca
Notice that both Betty and Wilma have the same secret word. This is fine.
The easiest way to store such a table in Perl is with a hash. Each element of the hash holds a separate scalar value (just like the other type of array), but the hashes are referenced by a key, which can be any scalar value (any string or number, including noninteger and negative values). To create a hash called
%words(notice the%rather than@) with the keys and values given in the table above, we assign a value to%words(much as we did earlier with the array):%words = qw( fred camel barney llama betty alpaca wilma alpaca );Each pair of values in the list represents one key and its corresponding value in the hash. Note that we broke this assignment over many lines without any sort of line-continuation character, because whitespace is generally insignificant in a Perl program.
To find the secret word for Betty, we need to use Betty as the key in a reference to the hash
%words, via some expression such as$words{"betty"}. The value of this reference isalpaca, similar to what we had before with the other array. Also as before, the key can be any expression, so setting$persontobettyand evaluating$words{$person}givesalpacaas well.Putting all this together, we get a program like this:
#!/usr/bin/perl %words = qw( fred camel barney llama betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name eq "Randal") { print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }Note the lookup of the secret word. If the name is not found, the value of
$secretwordwill be an empty string,[5] which we can then check for if we want to define a default secret word for everyone else. Here's how that looks:[5] Well, OK, it's the
undefvalue, but it looks like an empty string to theeqoperator. You'd get a warning about this if you used -w on the command line, which is why we omitted it here.[... rest of program deleted ...] $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; [... rest of program deleted ...]1.5.7 Handling Varying Input Formats
If I enter
RandalL.Schwartzorrandalrather thanRandal, I'm lumped in with the rest of the users, because theeqcomparison is an exact equality. Let's look at one way to handle that.Suppose I wanted to look for any string that began with
Randal, rather than just a string that was equal toRandal. I could do this in sed, awk, or grep with a regular expression: a template that defines a collection of strings that match. As in sed, awk, or grep, the regular expression in Perl that matches any string that begins withRandalis^Randal. To match this against the string in$name, we use the match operator as follows:if ($name =~ /^Randal/) { ## yes, it matches } else { ## no, it doesn't }Note that the regular expression is delimited by slashes. Within the slashes, spaces and other whitespace are significant, just as they are within strings.
This almost does it, but it doesn't handle selecting
randalor rejectingRandall. To acceptrandal, we add the ignore-case option, a smalliappended after the closing slash. To rejectRandall, we add a word boundary special marker (similar to vi and some versions of grep) in the form of\bin the regular expression. This ensures that the character following the firstlin the regular expression is not another letter. This changes the regular expression to be/^randal\b/i, which means "randalat the beginning of the string, no letter or digit following, and OK to be in either case."When put together with the rest of the program, it looks like this:
#!/usr/bin/perl %words = qw( fred camel barney llama betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^randal\b/i) { print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }As you can see, the program is a far cry from the simple
Hello,world, but it's still very small and workable, and does quite a bit for being so short. This is The Perl Way.Perl provides every regular expression feature found in every standard UNIX utility (and even some nonstandard ones). Not only that, but the way Perl handles string matching is about the fastest on the planet, so you don't lose performance. (A grep-like program written in Perl often beats the vendor-supplied[6] C-coded grep for most inputs. This means that grep doesn't even do its one thing very well.)
[6] GNU egrep tends to be much faster than Perl at this.
1.5.8 Making It Fair for the Rest
So, now I can enter
RandalorrandalorRandalL.Schwartz, but what about everyone else? Barney still has to say exactlybarney(not evenbarneyfollowed by a space).To be fair to Barney, we need to grab the first word of whatever's entered, and then convert it to lowercase before we look up the name in the table. We do this with two operators: the substitute operator, which finds a regular expression and replaces it with a string, and the translate operator, to put the string in lowercase.
First, the substitute operator: we want to take the contents of
$name, find the first nonword character, and zap everything from there to the end of the string./\W.*/is the regular expression we are looking for: the\Wstands for a nonword character (something besides a letter, digit, or underscore), and.*means any characters from there to the end of the line. Now, to zap these characters away, we need to take whatever part of the string matches this regular expression and replace it with nothing:$name =~ s/\W.*//;We're using the same
=~operator that we did before, but now on the right we have a substitute operator: the lettersfollowed by a slash-delimited regular expression and string. (The string in this example is the empty string between the second and third slashes.) This operator looks and acts very much like the substitutions of the various editors.Now, to get whatever's left into lowercase, we translate the string using the
troperator.[7] It looks a lot like a UNIX tr command, taking a list of characters to find and a list of characters to replace them with. For our example, to put the contents of$namein lowercase, we use:[7] This doesn't work for characters with accent marks, although the
ucfunction would. See the perllocale (1) manpage first distributed with the 5.004 release of Perl for details.$name =~ tr/A-Z/a-z/;The slashes delimit the searched-for and replacement character lists. The dash between
AandZstands for all the characters in between, so we have two lists that are each 26 characters long. When thetroperator finds a character from the string in the first list, the character is replaced with the corresponding character in the second list. So all uppercase A's become lowercase a's, and so on.[8][8] Experts will note that we could have also constructed something like
s/(\S*).*/\L$1/to do this all in one fell swoop, but experts probably won't be reading this section.Putting that together with everything else results in:
#!/usr/bin/perl %words = qw( fred camel barney llama betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); $original_name = $name; #save for greeting $name =~ s/\W.*//; # get rid of everything after first word $name =~ tr/A-Z/a-z/; # lowercase everything if ($name eq "randal") { # ok to compare this way now print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $original_name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }Notice how the regular expression match for
Randalbecame a simple comparison again. After all, bothRandalL.SchwartzandRandalbecomerandalafter the substitution and translation. And everyone else gets a fair ride, becauseFredandFredFlintstoneboth becomefred;BarneyRubbleandBarney,thelittleguybecomebarney, and so on.With just a few statements, we've made the program much more user-friendly. You'll find that expressing complicated string manipulation with a few keystrokes is one of Perl's many strong points.
However, hacking away at the name so that we could compare it and look it up in the table destroyed the name that was entered. So, before the program hacks on the name, it saves it in
$original_name. (Like C symbols, Perl variable names consist of letters, digits, and underscores and can be of nearly unlimited length.) We can then make references to$original_namelater.Perl has many ways to monitor and mangle strings. You'll find out about most of them in Chapter 7, Regular Expressions, and Chapter 15, Other Data Transformation .
1.5.9 Making It a Bit More Modular
Now that we've added so much to the code, we have to scan through many detailed lines before we can get the overall flow of the program. What we need is to separate the high-level logic (asking for a name, looping based on entered secret words) from the details (comparing a secret word to a known good word). We might do this for clarity, or maybe because one person is writing the high-level part and another is writing (or has already written) the detailed parts.
Perl provides subroutines that have parameters and return values. A subroutine is defined once in a program, and can be used repeatedly by being invoked from within any expression.
For our small-but-rapidly-growing program, let's create a subroutine called
good_wordthat takes a name and a guessed word, and returns true if the word is correct and false if not. The definition of such a subroutine looks like this:sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # get rid of everything after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "randal") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { return 0; # return value is false } }First, the definition of a subroutine consists of the reserved word
subfollowed by the subroutine name followed by a block of code (delimited by curly braces). This definition can go anywhere in the program file, though most people put it at the end.The first line within this particular definition is an assignment that copies the values of the two parameters of this subroutine into two local variables named
$somenameand$someguess. (Themy()defines the two variables as private to the enclosing block - in this case, the entire subroutine - and the parameters are initially in a special local array called@_.)The next two lines clean up the name, just like the previous version of the program.
The
if-elsif-elsestatement decides whether the guessed word ($someguess) is correct for the name ($somename).Randalshould not make it into this subroutine, but even if it does, whatever word was guessed is OK.A return statement can be used to make the subroutine immediately return to its caller with the supplied value. In the absence of an explicit return statement, the last expression evaluated in a subroutine is the return value. We'll see how the return value is used after we finish describing the subroutine definition.
The test for the
elsifpart looks a little complicated; let's break it apart:($words{$somename} || "groucho") eq $someguessThe first thing inside the parentheses is our familiar hash lookup, yielding some value from
%wordsbased on a key of$somename. The operator between that value and the stringgrouchois the||(logical-or) operator similar to that used in C and awk and the various shells. If the lookup from the hash has a value (meaning that the key$somenamewas in the hash), the value of the expression is that value. If the key could not be found, the string ofgrouchois used instead. This is a very Perl-like thing to do: specify some expression, and then provide a default value using||in case the expression turns out to be false.In any case, whether it's a value from the hash, or the default value
groucho, we compare it to whatever was guessed. If the comparison is true, we return 1, otherwise we return 0.So, expressed as a rule, if the name is
randal, or the guess matches the lookup in%wordsbased on the name (with a default ofgrouchoif not found), then the subroutine returns 1, otherwise it returns 0.Now let's integrate all this with the rest of the program:
#!/usr/bin/perl %words = qw( fred camel barney llama betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^randal\b/i) { # back to the other way :-) print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } } [... insert definition of good_word() here ...]Notice that we've gone back to the regular expression to check for
Randal, because now there's no need to pull apart the first name and convert it to lowercase, as far as the main program is concerned.The big difference is the
whileloop containing the subroutinegood_word. Here, we see an invocation of the subroutine, passing it two parameters,$nameand$guess. Within the subroutine, the value of$somenameis set from the first parameter, in this case$name. Likewise,$someguessis set from the second parameter,$guess.The value returned by the subroutine (either 1 or 0, recalling the definition given earlier) is logically inverted with the prefix
!(logical not) operator. This operator returns true if the expression following is false, and returns false if the expression following is true. The result of this negation controls thewhileloop. You can read this as "while it's not a good word...". Many well-written Perl programs read very much like English, provided you take a few liberties with either Perl or English. (But you certainly won't win a Pulitzer that way.)Note that the subroutine assumes that the value of the
%wordshash is set by the main program.Such a cavalier approach to global variables doesn't scale very well, of course. Generally speaking, variables not created with
myare global to the whole program, while thosemycreates last only until the block in which they were declared exits. Don't worry: Perl does in fact support a rich variety of other kinds of variables, including those private to a file (or package), as well as variables private to a function that retain their values between invocations, which is what we could really use here. However, at this stage in your Perl education, explaining these would only complicate your life. When you're ready for it, check out what Programming Perl has to say about scoping, subroutines, modules, and objects, or see the online documentation in the perlsub (1), perlmod (1), perlobj (1), and perltoot (1) manpages.1.5.10 Moving the Secret Word List into a Separate File
Suppose we wanted to share the secret word list among three programs. If we store the word list as we have done already, we will need to change all three programs when Betty decides that her secret word should be
swinerather thanalpaca. This can get to be a hassle, especially if Betty changes her mind often.So, let's put the word list into a file and then read the file to get the word list into the program. To do this, we need to create an I/O channel called a filehandle. Your Perl program automatically gets three filehandles called
STDIN,STDOUT, andSTDERR, corresponding to the three standard I/O channels in most programming environments. We've already been using theSTDINhandle to read data from the person running the program. Now, it's just a matter of getting another handle attached to a file of our own choice.Here's a small chunk of code to do that:
sub init_words { open (WORDSLIST, "wordslist"); while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST); }We're putting it into a subroutine so that we can keep the main part of the program uncluttered. This also means that at a later time (hint: a few revisions down in this stroll), we can change where the word list is stored, or even the format of the list.
The arbitrarily chosen format of the word list is one item per line, with names and words, alternating. So, for our current database, we'd have something like this:
fred camel barney llama betty alpaca wilma alpacaThe
openfunction initializes a filehandle namedWORDSLISTby associating it with a file namedwordslistin the current directory. Note that the filehandle doesn't have a funny character in front of it as the three variable types do. Also, filehandles are generally uppercase - although they aren't required to be - for reasons detailed later.The
whileloop reads lines from thewordslistfile (via theWORDSLISTfilehandle) one line at a time. Each line is stored into the$namevariable. At the end of the file, the value returned by the<WORDSLIST>operation is the empty string,[9] which looks false to thewhileloop, and terminates it. That's how we get out at the end.[9] Well, technically it's
undef, but close enough for this discussion.If you were running with -w, you would have to check that the return value read in was actually defined. The empty string returned by the
<WORDLIST>operation isn't merely empty: it'sundefagain. Thedefinedfunction is how you test forundefwhen this matters. When reading lines from a file, you'd do the test this way:while ( defined ($name = <WORDLIST>) ) {But if you were being that careful, you'd probably also have checked to make sure that
openreturned a true value. You know, that's probably not a bad idea either. The built-indiefunction is frequently used to exit the program with an error message in case something goes wrong. We'll see an example of it in the next revision of the program.On the other hand, the normal case is that we've read a line (including the newline) into
$name. First, off comes the newline using thechompfunction. Then, we have to read the next line to get the secret word, holding that in the$wordvariable. It, too, gets the newline hacked off.The final line of the
whileloop puts$wordinto%wordswith a key of$name, so that the rest of the program can access it later.Once the file has been read, the filehandle can be recycled with the
closefunction. (Filehandles are automatically closed anyway when the program exits, but we're trying to be tidy. If we were really tidy, we'd even check for a true return value fromclosein case the disk partition the file was on went south, its network filesystem became unreachable, or some other catastrophe occurred. Yes, these things really do happen. Murphy will always be with us.)This subroutine definition can go after or before the other one. And we invoke the subroutine instead of setting
%wordsin the beginning of the program, so one way to wrap up all of this might look like:#!/usr/bin/perl init_words(); print "What is your name? "; $name = <STDIN>; chomp $name; if ($name =~ /^randal\b/i) { # back to the other way :-) print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } } ## subroutines from here down sub init_words { open (WORDSLIST, "wordslist") || die "can't open wordlist: $!"; while ( defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp $word; $words{$name} = $word; } close (WORDSLIST) || die "couldn't close wordlist: $!"; } sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # delete everything after # first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "randal") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { return 0; # return value is false } }Now it's starting to look like a full grown program. Notice the first executable line is an invocation of
init_words(). The return value is not used in a further calculation, which is good because we didn't return anything remarkable. In this case, it's guaranteed to be a true value (the value 1, in particular), because if theclosehad failed, thediewould have printed a message toSTDERRand exited the program. Thediefunction is fully explained in Chapter 10, Filehandles and File Tests, but because it's essential to check the return values of anything that might fail, we'll get into the habit of using it right from the start. The$!variable (also explained in Chapter 10), contains the system error message explaining why the system call failed.The
openfunction is also used to open files for output, or open programs as files (demonstrated shortly). The full scoop onopencomes much later in this book, however, in Chapter 10.1.5.11 Ensuring a Modest Amount of Security
"That secret word list has got to change at least once a week!" cries the Chief Director of Secret Word Lists. Well, we can't force the list to be different, but we can at least issue a warning if the secret word list has not been modified in more than a week.
The best place to do this is in the
init_words()subroutine; we're already looking at the file there. The Perl operator-Mreturns the age in days since a file or filehandle has last been modified, so we just need to see whether this is greater than seven for theWORDSLISTfilehandle:sub init_words { open (WORDSLIST, "wordslist") || die "can't open wordlist: $!"; if (-M WORDSLIST >= 7.0) { # comply with bureaucratic policy die "Sorry, the wordslist is older than seven days."; } while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST) || die "couldn't close wordlist: $!"; }The value of
-MWORDSLISTis compared to seven, and if greater, bingo, we've violated policy.The rest of the program remains unchanged, so in the interest of saving a few trees, I won't repeat it here.
Besides getting the age of a file, we can also find out its owner, size, access time, and everything else that the system maintains about a file. More on that in Chapter 10.
1.5.12 Warning Someone When Things Go Astray
Let's see how much we can bog down the system by sending a piece of email each time someone guesses their secret word incorrectly. We need to modify only the
good_word()subroutine (thanks to modularity) because we have all the information right there.The mail will be sent to you if you type your own mail address where the code says "YOUR_ADDRESS_HERE." Here's what we have to do: just before we return 0 from the subroutine, we create a filehandle that is actually a process (mail ), like so:
sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # get rid of stuff after # first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "randal") { # should not need to guess return 1; # return value is true } elsif (($words{$somename}||"groucho") eq $someguess) { return 1; # return value is true } else { open MAIL,"|mail YOUR_ADDRESS_HERE"; print MAIL "bad news: $somename guessed $someguess\n"; close MAIL; return 0; # return value is false } }The first new statement here is
open, which has a pipe symbol (|) at the beginning of its second argument. This is a special indication that we are opening a command rather than a file. Because the pipe is at the beginning of the command, we are opening a command so that we can write to it. (If you put the pipe at the end rather than the beginning, you can read the output of a command instead.)The next statement, a
STDOUT.[10] This means that the message will end up as the input to the mail command.[10] Well, technically, the currently selected filehandle. That's covered much later, though.
Finally, we close the filehandle, which starts mail sending its data merrily on its way.
To be proper, we could have sent the correct response as well as the error response, but then someone reading over my shoulder (or lurking in the mail system) while I'm reading my mail might get too much useful information.
Perl can also open filehandles, invoke commands with precise control over argument lists, or even fork off a copy of the current program, and execute two (or more) copies in parallel. Backquotes (like the shell's backquotes) give an easy way to grab the output of a command as data. All of this gets described in Chapter 14, Process Management , so keep reading.
1.5.13 Many Secret Word Files in the Current Directory
Let's change the definition of the secret word filename slightly. Instead of just the file named
wordslist, let's look for anything in the current directory that ends in.secret. To the shell, we sayecho *.secretto get a brief listing of all of these names. As you'll see in a moment, Perl uses a similar wildcard-name syntax.
Pulling out the
init_words()definition again:sub init_words { while ( defined($filename = glob("*.secret")) ) { open (WORDSLIST, $filename) || die "can't open wordlist: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp $name; $word = <WORDSLIST>; chomp $word; $words{$name} = $word; } } close (WORDSLIST) || die "couldn't close wordlist: $!"; } }First, we've wrapped a new
whileloop around the bulk of the routine from the previous version. The new thing here is theglobfunction. This is called a filename glob, for historical reasons. It works much like<STDIN>, in that each time it is accessed, it returns the next value: successive filenames that match the shell pattern, in this case*.secret. When there are no additional filenames to be returned, the filename glob returns an empty string.[11][11] Yeah, yeah,
undefagain.So if the current directory contains
fred.secretandbarney.secret, then$filenameisbarney.secreton the first pass through thewhileloop (the names come out in alphabetically sorted order). On the second pass,$filenameisfred.secret. And there is no third pass because the glob returns an empty string the third time it is called, perceived by thewhileloop to be false, causing an exit from the subroutine.Within the
whileloop, we open the file and verify that it is recent enough (less than seven days since the last modification). For the recent-enough files, we scan through as before.Note that if there are no files that match
*.secretand are less than seven days old, the subroutine will exit without having set any secret words into the%wordsarray. That means that everyone will have to use the wordgroucho. Oh well. (For real code, I would have added some check on the number of entries in%wordsbefore returning, anddie'd if it weren't good. See thekeysfunction when we get to hashes in Chapter 5, Hashes.)1.5.14 Listing the Secret Words
Well, the Chief Director of Secret Word Lists wants a report of all the secret words currently in use and how old they are. If we set aside the secret word program for a moment, we'll have time to write a reporting program for the Director.
First, let's get all of the secret words, by stealing some code from the
init_words()subroutine:while ( defined($filename = glob("*.secret")) ) { open (WORDSLIST, $filename) || die "can't open wordlist: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); ### new stuff will go here } } close (WORDSLIST) || die "couldn't close wordlist: $!"; }At the point marked "new stuff will go here," we know three things: the name of the file (in
$filename), someone's name (in$name), and that person's secret word (in$word). Here's a place to use Perl's report generating tools. We define a format somewhere in the program (usually near the end, like a subroutine):format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word .The format definition begins with
formatSTDOUT=, and ends with a single period. The two lines between are the format itself. The first line of this format is a field definition line that specifies the number, length, and type of the fields. For this format, we have three fields. The line following a field definition line is always a field value line. The value line gives a list of expressions that will be evaluated when this format is used, and the results of those expressions will be plugged into the fields defined in the previous line.We invoke this format with the
writefunction, like so:#!/usr/bin/perl while ( defined($filename = glob("*.secret")) ) { open (WORDSLIST, $filename) || die "can't open wordlist: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); write; # invoke format STDOUT to STDOUT } } close (WORDSLIST) || die "couldn't close wordlist: $!"; } format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word .When the format is invoked, Perl evaluates the field expressions and generates a line that it sends to the
STDOUTfilehandle. Becausewriteis invoked once each time through the loop, we'll get a series of lines with text in columns, one line for each secret word entry.Hmm. We haven't labeled the columns. That's easy enough. We just need to add a top-of-page format, like so:
format STDOUT_TOP = Page @<< $% Filename Name Word ================ ========== ============ .This format is named
STDOUT_TOP, and will be used initially at the first invocation of theSTDOUTformat, and again every time 60 lines of output toSTDOUThave been generated. The column headings here line up with the columns from theSTDOUTformat, so everything comes out tidy.The first line of this format shows some constant text (
Page) along with a three-character field definition. The following line is a field value line, here with one expression. This expression is the$%variable,[12] which holds the number of pages printed - a very useful value in top-of-page formats.[12] More mnemonic aliases for these predefined scalar variables are available via the English module.
The third line of the format is blank. Because this line does not contain any fields, the line following it is not a field value line. This blank line is copied directly to the output, creating a blank line between the page number and the column headers below.
The last two lines of the format also contain no fields, so they are copied as is directly to the output. So this format generates four lines, one of which has a part that changes from page to page.
Just tack this definition onto the previous program to get it to work. Perl notices the top-of-page format automatically.
Perl also has fields that are centered or right-justified, and supports a filled paragraph area as well. More on this when we get to formats in Chapter 11, Formats.
1.5.15 Making Those Old Word Lists More Noticeable
As we are scanning through the
*.secretfiles in the current directory, we may find files that are too old. So far, we are simply skipping over those files. Let's go one step more: we'll rename them to*.secret.oldso that a directory listing will quickly show us which files are too old, simply by name.Here's how the
init_words()subroutine looks with this modification:sub init_words { while ( defined($filename = glob("*.secret")) ) { open (WORDSLIST, $filename) || die "can't open wordlist: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } } else { # rename the file so it gets noticed rename ($filename,"$filename.old") || die "can't rename $filename to $filename.old: $!"; } close (WORDSLIST) || die "couldn't close wordlist: $!"; } }Notice the new
elsepart of the file age check. If the file is older than seven days, it gets renamed with therenamefunction. This function takes two parameters, renaming the file named by the first parameter to the name given in the second parameter.Perl has a complete range of file manipulation operators; anything you can do to a file from a C program, you can also do from Perl.
1.5.16 Maintaining a Last-Good-Guess Database
Let's keep track of when the most recent correct guess has been made for each user. One data structure that might seem to work at first glance is a hash. For example, the statement
$last_good{$name} = time;assigns the current time in internal format (some large integer above 800 million, incrementing one number per second) to an element of
%last_goodthat has the name for a key. Over time, this would seem to give us a database indicating the most recent time the secret word was guessed properly for each of the users who had invoked the program.But, the hash doesn't have an existence between invocations of the program. Each time the program is invoked, a new hash is formed. So at most, we create a one-element hash and then immediately forget it when the program exits.
The
dbmopenfunction[13] maps a hash out into a disk file (actually a pair of disk files) known as a DBM. It's used like this:[13] Or using the more low-level
tiefunction on a specific database, as detailed in Chapters 5 and 7 of Programming Perl, or in the perltie (1) and AnyDBM_File (3) manpages.dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; $last_good{$name} = time; dbmclose (%last_good) || die "can't dbmclose lastdb: $!";The first statement performs the mapping, using the disk filenames of
lastdb.dirandlastdb.pag(these names are the normal names for a DBM calledlastdb). The file permissions used for these two files if the files must be created (as they will the first time through) is0666.[14] This mode means that anyone can read or write the files. If you're on a UNIX system, file permission bits are described in the chmod (2) manpage. On non-UNIX systems, chmod ( ) may or may not work the same way. For example, under MS-DOS, files have no permissions, whereas under WindowsNT, they do. See your port's release notes about this if you're unsure.[14] The actual permissions of the files will be the logical AND of 0666 and your process's current umask.
The second statement shows that we use this mapped hash just like a normal hash. However, creating or updating an element of the hash automatically updates the disk files that form the DBM. And, when the hash is later accessed, the values within the hash come directly from the disk image. This gives the hash a life beyond the current invocation of the program - a persistence of its own.
The third statement disconnects the hash from the DBM, much like a file
closeoperation.Although the inserted statements maintain the database just fine (and even create it the first time), we don't have any way of examining the information yet. To do that, we can create a separate little program that looks something like this:
#!/usr/bin/perl -w dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; foreach $name (sort keys (%last_good)) { $when = $last_good{$name}; $hours = (time() - $when) / 3600; # compute hours ago write; } format STDOUT = User @<<<<<<<<<<<: last correct guess was @<<< hours ago. $name, $hours .We've got a few new operations here: a
foreachloop, sorting a list, and getting the keys of an array.First, the
keysfunction takes a hash name as an argument and returns a list of all the keys of that hash in some unspecified order. For the%wordshash defined earlier, the result is something likefred,barney,betty,wilma, in some unspecified order. For the%last_goodhash, the result will be a list of all users who have guessed their own secret word successfully.The
sortfunction sorts the list alphabetically (just as if you passed a text file through the sort command). This makes sure that the list processed by theforeachstatement is always in alphabetical order.Finally, the Perl
foreachstatement is a lot like the C-shellforeachstatement. It takes a list of values and assigns each one in turn to a scalar variable (here,$name) executing the body of the loop (a block) once for each value. So, for five names in the%last_goodlist, we get five passes through the loop, with$namebeing a different value each time.The body of the
foreachloop loads up a couple of variables used within theSTDOUTformat and invokes the format. Note that we figure out the age of the entry by subtracting the stored system time (in the array) from the current time (as returned bytime) and then divide that by 3600 (to convert seconds to hours).Perl also provides easy ways to create and maintain text-oriented databases (like the Password file) and fixed-length-record databases (like the "last login" database maintained by the login program). These are described in Chapter 17, User Database Manipulation .
1.5.17 The Final Programs
Here are the programs from this stroll in their final form so you can play with them.
First, the "say hello" program:
#!/usr/bin/perl init_words(); print "what is your name? "; $name = <STDIN>; chomp($name); if ($name =~ /^randal\b/i) { # back to the other way :-) print "Hello, Randal! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp $guess; while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp $guess; } } dbmopen (%last_good,"lastdb",0666); $last_good{$name} = time; dbmclose (%last_good); sub init_words { while ($filename = <*.secret>) { open (WORDSLIST, $filename)|| die "can't open $filename: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } } else { # rename the file so it gets noticed rename ($filename,"$filename.old") || die "can't rename $filename: $!"; } close WORDSLIST; } } sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # delete everything after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "randal") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { open (MAIL, "|mail YOUR_ADDRESS_HERE"); print MAIL "bad news: $somename guessed $someguess\n"; close MAIL; return 0; # return value is false } }Next, we have the secret word lister:
#!/usr/bin/perl while ($filename = <*.secret>) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST < 7.0) { while ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); write; # invoke format STDOUT to STDOUT } } close (WORDSLIST); } format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word . format STDOUT_TOP = Page @<< $% Filename Name Word ================ ========== ============ .And finally, the last-time-a-word-was-used display program:
#!/usr/bin/perl dbmopen (%last_good,"lastdb",0666); foreach $name (sort keys %last_good) { $when = $last_good{$name}; $hours = ( time - $when) / 3600; # compute hours ago write; } format STDOUT = User @<<<<<<<<<<<: last correct guess was @<<< hours ago. $name, $hours .Together with the secret word lists (files named
something.secret in the current directory) and the databaselastdb.dirandlastdb.pag, you'll have all you need.1.6 Exercise
Most chapters end with some exercises, for which answers are found in Appendix A, . For this stroll, the answers have already been given above.
Type in the example programs, and get them to work. (You'll need to create the secret-word lists as well.) Consult your local Perl guru if you need assistance.