BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

Print+PDF $51.94

Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Advanced Perl Programming
Advanced Perl Programming, Second Edition

By Simon Cozens
Book Price: $39.95 USD
£28.50 GBP
PDF Price: $31.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Advanced Techniques
Once you have read the Camel Book (Programming Perl), or any other good Perl tutorial, you know almost all of the language. There are no secret keywords, no other magic sigils that turn on Perl's advanced mode and reveal hidden features. In one sense, this book is not going to tell you anything new about the Perl language.
What can I tell you, then? I used to be a student of music. Music is very simple. There are 12 possible notes in the scale of Western music, although some of the most wonderful melodies in the world only use, at most, eight of them. There are around four different durations of a note used in common melodies. There isn't a massive musical vocabulary to choose from. And music has been around a good deal longer than Perl. I used to wonder whether or not all the possible decent melodies would soon be figured out. Sometimes I listen to the Top 10 and think I was probably right back then.
But of course it's a bit more complicated than that. New music is still being produced. Knowing all the notes does not tell you the best way to put them together. I've said that there are no secret switches to turn on advanced features in Perl, and this means that everyone starts on a level playing field, in just the same way that Johann Sebastian Bach and a little kid playing with a xylophone have precisely the same raw materials to work with. The key to producing advanced Perl—or advanced music—depends on two things: knowledge of techniques and experience of what works and what doesn't.
The aim of this book is to give you some of each of these things. Of course, no book can impart experience. Experience is something that must be, well, experienced. However, a book like this can show you some existing solutions from experienced Perl programmers and how to use them to solve the problems you may be facing.
On the other hand, a book can certainly teach techniques, and in this chapter we're going to look at the three major classes of advanced programming techniques in Perl. First, we'll look at introspection: programs looking at programs, figuring out how they work, and changing them. For Perl this involves manipulating the symbol table—especially at runtime, playing with the behavior of built-in functions and using
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introspection
First, though, introspection. These introspection techniques appear time and time again in advanced modules throughout the book. As such, they can be regarded as the most fundamental of the advanced techniques—everything else will build on these ideas.
Globs are one of the most misunderstood parts of the Perl language, but at the same time, one of the most fundamental. This is a shame, because a glob is a relatively simple concept.
When you access any global variable in Perl—that is, any variable that has not been declared with my—the perl interpreter looks up the variable name in the symbol table. For now, we'll consider the symbol table to be a mapping between a variable's name and some storage for its value, as in Figure 1-1.
Note that we say that the symbol table maps to storage for the value. Introductory programming texts should tell you that a variable is essentially a box in which you can get and set a value. Once we've looked up $a, we know where the box is, and we can get and set the values directly. In Perl terms, the symbol table maps to a reference to $a.
Figure 1-1: Consulting the symbol table, take 1
You may have noticed that a symbol table is something that maps names to storage, which sounds a lot like a Perl hash. In fact, you'd be ahead of the game, since the Perl symbol table is indeed implemented using an ordinary Perl hash. You may also have noticed, however, that there are several things called a in Perl, including $a, @a, %a, &a, the filehandle a, and the directory handle a.
This is where the glob comes in. The symbol table maps a name like a to a glob, which is a structure holding references to all the variables called a, as in Figure 1-2.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Messing with the Class Model
Perl's style of object orientation is often maligned, but its sheer simplicity allows the advanced Perl programmer to extend Perl's behavior in interesting—and sometimes startling—ways. Because all the details of Perl's OO model happen at runtime and in the open—using an ordinary package variable (@INC) to handle inheritance, for instance, or using the symbol tables for method dispatch—we can fiddle with almost every aspect of it.
In this section we'll see some techniques specific to playing with the class model, but we will also examine how to apply the techniques we already know to distort Perl's sense of OO.
In almost all class-based OO languages, all objects derive from a common class, sometimes called Object. Perl doesn't quite have the same concept, but there is a single hard-wired class called UNIVERSAL , which acts as a last-resort class for method lookups. By default, UNIVERSAL provides three methods: isa, can, and VERSION.
We saw isa briefly in the last section; it consults a class or object's @ISA array and determines whether or not it derives from a given class:
    package Coffee;
    our @ISA = qw(Beverage::Hot);

    sub new { return bless { temp => 80 }, shift }

    package Tea;
    use base 'Beverage::Hot';

    package Latte;
    use base 'Coffee';

    package main;
    my $mug = Latte->new;

    Tea->isa("Beverage::Hot"); # 1
    Tea->isa("Coffee"); # 0

    if ($mug->isa("Beverage::Hot")) {
        warn 'Contents May Be Hot';
    }
isa is a handy method you can use in modules to check that you've been handed the right sort of object. However, since not everything in Perl is an object, you may find that just testing a scalar with isa is not enough to ensure that your code doesn't blow up: if you say $thing->isa(...) on an unblessed reference, Perl will die.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unexpected Code
The final set of advanced techniques in this chapter covers anything where Perl code runs at a time that might not be obvious: tying, for instance, runs code when a variable is accessed or assigned to; overloading runs code when various operations are called on a value; and time shifting allows us to run code out of order or delayed until the end of scope.
Some of the most striking effects in Perl can be obtained by arranging for code to be run at unexpected moments, but this must be tempered with care. The whole point of unexpected code is that it's unexpected, and that breaks the well-known Principle of Least Surprise: programming Perl should not be surprising.
On the other hand, these are powerful techniques. Let's take a look at how to make the best use of them.
Overloading, in a Perl context, is a way of making an object look like it isn't an object. More specifically, it's a way of making an object respond to methods when used in an operation or other context that doesn't look like a method call.
The problem with such overloading is that it can quickly get wildly out of hand. C++ overloads the left bit-shift operator, <<, on filehandles to mean print:
    cout << "Hello world";
since it looks like the string is heading into the stream. Ruby, on the other hand, overloads the same operator on arrays to mean push. If we make flagrant use of overloading in Perl, we end up having to look at least twice at code like:
    $object *= $value;
We look once to see it as a multiplication, once to realize it's actually a method call, and once more to work out what class $object is in at this point and hence what method has been called.
That said, for classes that more or less represent the sort of things you're overloading—numbers, strings, and so on—then overloading works fine. Now, how do we do it?

Section 1.3.1.1: Simple operator overloading

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conclusion
We've now looked at many of the advanced techniques used in pure Perl modules, most of them involving how to manipulate the way Perl operates. We've divided those roughly into sections on messing with the symbol table, messing with the class model, and making code run where code might not be expected.
In a sense, everything else in this book will be built on the techniques that we've seen here. However, Perl is a pragmatic language, and instead of looking in the abstract at techniques that might be useful, we're going to see how these tricks are already being used in real-life code—in CPAN modules—and how they can make your programming life easier.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Parsing Techniques
One thing Perl is particularly good at is throwing data around. There are two types of data in the world: regular, structured data and everything else. The good news is that regular data—colon delimited, tab delimited, and fixed-width files—is really easy to parse with Perl. We won't deal with that here. The bad news is that regular, structured data is the minority.
If the data isn't regular, then we need more advanced techniques to parse it. There are two major types of parser for this kind of less predictable data. The first is a bottom-up parser. Let's say we have an HTML page. We can split the data up into meaningful chunks or tokens—tags and the data between tags, for instance—and then reconstruct what each token means. See Figure 2-1. This approach is called bottom-up parsing because it starts with the data and works toward a parse.
Figure 2-1: Bottom-up parsing of HTML
The other major type of parser is a top-down parser. This starts with some ideas of what an HTML file ought to look like: it has an <html> tag at the start and an </html> at the end, with some stuff in the middle. The parser can find that pattern in the document and then look to see what the stuff in the middle is likely to be. See Figure 2-2. This is called a top-down parse because it starts with all the possible parses and works down until it matches the actual contents of the document.
Figure 2-2: Top-down parsing of HTML
Damian Conway's Parse::RecDescent module is the most widely used parser generator for Perl. While most traditional parser generators, such as yacc, produce bottom-up parsers, Parse::RecDescent creates top-down parsers. Indeed, as its name implies, it produces a recursive descent parser. One of the benefits of top-down parsing is that you don't usually have to split the data into tokens before parsing, which makes it easier and more intuitive to use.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parse::RecDescent Grammars
Damian Conway's Parse::RecDescent module is the most widely used parser generator for Perl. While most traditional parser generators, such as yacc, produce bottom-up parsers, Parse::RecDescent creates top-down parsers. Indeed, as its name implies, it produces a recursive descent parser. One of the benefits of top-down parsing is that you don't usually have to split the data into tokens before parsing, which makes it easier and more intuitive to use.
I'm a compulsive player of the Japanese game of Go. We generally use a file format called Smart Game Format (http://www.red-bean.com/sgf/) for exchanging information about Go games. Here's an example of an SGF file:
    (;GM[1]FF[4]CA[UTF-8]AP[CGoban:2]ST[2]
    RU[Japanese]SZ[19]HA[5]KM[5.50]TM[  ]
    PW[Simon Cozens]PB[Keiko Aihara]AB[dd][pd][jj][dp][pp]
    ;W[df];B[fd];W[cn]
       (;B[dl])
       (;B[fp]CR[fp]C[This is the usual response.])
       (;B[co]CR[co]C[This way is stronger still.]
        ;W[dn];B[fp])
    )
This little game consists of three moves, followed by three different variations for what happens next, as shown in Figure 2-3. The file describes a tree structure of variations, with parenthesised sections being variations and subvariations.
Figure 2-3: Tree of moves
Each variation contains several nodes separated by semicolons, and each node has several parameters. This sort of description of the format is ideal for constructing a top-down parser.
The first thing we'll do is create something that merely works out whether some text is a valid SGF file by checking whether it parses. Let's look at the structure carefully again from the top and, as we go, translate it into a grammar suitable for Parse::RecDescent.
Let's call the whole thing a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parse::Yapp
If you're more familiar with tools like yacc , you may prefer to use François Désarménien's Parse::Yapp module. This is more or less a straight port of yacc to Perl.
For instance, let's use Parse::Yapp to implement the calculator in Chapter 3 of lex & yacc (O'Reilly). This is a very simple calculator with a symbol table, so you can say things like this:
            
    a = 25
    b = 30
    a + b
    55
Here's their grammar:
    %{
    double vbltable[26];
    %}

    %union {
        double dval;
        int vblno;
    }

    %token <vblno> NAME
    %token <dval> NUMBER
    %left '-' '+'
    %left '*' '/'
    %nonassoc UMINUS

    %type <dval> expression
    %%
    statement_list:    statement '\n'
        |    statement_list statement '\n'
        ;

    statement:    NAME '=' expression    { vbltable[$1] = $3; }
        |    expression        { printf("= %g\n", $1); }
        ;

    expression:    expression '+' expression { $$ = $1 + $3; }
        |    expression '-' expression { $$ = $1--$3; }
        |    expression '*' expression { $$ = $1 * $3; }
        |    expression '/' expression
                    {    if($3 =  = 0.0)
                            yyerror("divide by zero");
                        else
                            $$ = $1 / $3;
                    }
        |    '-' expression %prec UMINUS    { $$ = -$2; }
        |    '(' expression ')'    { $$ = $2; }
        |    NUMBER
        |    NAME            { $$ = vbltable[$1]; }
        ;
    %%
Converting the grammar is very straightforward; the only serious change we need to consider is how to implement the symbol table. We know that Perl's internal symbol tables are just hashes, so that's good enough for us. The other changes are just cosmetic, and we end up with a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Other Parsing Techniques
Of course, we don't want to always be writing our own parsers for most of the data we come across, as there's a good chance someone else has come across that sort of data before. The best examples are HTML and XML: there's a vast amount of code out there that deals with these file formats, and most of the hard work has been put into CPAN modules. We'll look at a few of these modules in this section.
I'll start by saying something that is anathema to a lot of advanced Perl programmers: in certain circumstances, it is acceptable to use regular expressions to extract the data you want from HTML. I've written a bunch of screen-scraping programs to automate access to various web sites and applications, and because I knew the pages were machine-generated and unlikely to change, I had no qualms about using regular expressions to get what I wanted.
In general, though, you should do things properly. The way to parse HTML properly is to use the HTML::Parser module.
HTML::Parser is incredibly flexible. It supports several methods of operation: you can use OO inheritance, you can use callbacks, you can determine what data gets sent to callbacks and when the callbacks are called, and so on. We'll only look here at the simplest way of using it: by subclassing the module.
Let's begin by examining a way to dump out the URL and link text for every hyperlink in a document. Because we're inheriting from HTML::Parser, we need to say something like this:
    package DumpLinks;
    use strict;
    use base 'HTML::Parser';
Next, we specify what happens when we see a start tag: if it's not an <a> tag, then we ignore it. If it is, we make a note of its href attribute and remember that we're currently in an <a> tag.
    sub start {
       my ($self, $tag, $attr) = @_;
       return unless $tag eq "a";
       $self->{_this_url} = $attr->{href};
       $self->{_in_link} = 1;
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conclusion
In this chapter, we've seen many of the techniques used for parsing structured data with Perl. Whether it's a case of creating your own parsers with Parse::RecDescent or Parse::Yapp, or choosing a ready-made parsing module, Perl is perfect for throwing around data and converting it into a different format.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Templating Tools
A recent thread on comp.lang.perl.moderated enumerated the Perl rites of passage—the perfectly good wheels that every journeyman Perl programmer reinvents. These were found to be a templating system, a database abstraction layer, an HTML parser, a processor for command-line arguments, and a time/date handling module.
See if you recognize yourself in the following story: you need to produce a form letter of some description. You've got a certain amount of fixed content, and a certain amount that changes. So you set up a template a little like this:
    my $template = q{
        Dear $name,

        We have received your request for a quote for $product, and have
        calculated that it can be delivered to you by $date at a cost of
        approximately $cost.

        Thank you for your interest,

        Acme Integrated Foocorp.
    };
Then you struggle with some disgusting regular expression along the lines of s/(\$\w+)/$1/eeg, and eventually you get something that more or less does the job.
As with all projects, the specifications change two days after it goes live, so you suddenly need to extend your simple template to handle looping over arrays, conditionals, and eventually executing Perl code in the middle of the template itself. Before you realize what's happened, you've created your own templating language.
Don't worry if that's you. Nearly everyone's done it at least once. That's why there's a wide selection of modules on CPAN for templating text and HTML output, ranging from being only slightly more complex than s/(\$\w+)/$1/eeg to complete independent templating languages.
Before we start looking at these modules, though, let's consider the built-in solution—the humble Perl format.
Formats have been in Perl since version 1.0. They're not used very much these days, but for a lot of what people want from text formatting, they're precisely the right thing.
Perl formats allow you to draw up a picture of the data you want to output, and then paint the data into the format. For instance, in a recent application, I needed to display a set of IDs, dates, email addresses, and email subjects with one line per mail. If we assume that the line is fixed at 80 columns, we may need to truncate some of those fields and pad others to wider than their natural width. In pure Perl, there are basically three ways to get this sort of formatted output. There's
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Formats and Text::Autoformat
Formats have been in Perl since version 1.0. They're not used very much these days, but for a lot of what people want from text formatting, they're precisely the right thing.
Perl formats allow you to draw up a picture of the data you want to output, and then paint the data into the format. For instance, in a recent application, I needed to display a set of IDs, dates, email addresses, and email subjects with one line per mail. If we assume that the line is fixed at 80 columns, we may need to truncate some of those fields and pad others to wider than their natural width. In pure Perl, there are basically three ways to get this sort of formatted output. There's sprintf (or printf) and substr:
    for (@mails) {
        printf "%5i %10s %40s %21s\n",
            $_->id,
            substr($_->received,0,10),
            substr($_->from_address,-40,40),
            substr($_->subject,0,21);
    }
Then there's pack, which everyone forgets about (and which doesn't give as much control over truncation):
    for (@mails) {
        print pack("A5 A10 A40 A21\n",
          $_->id, $_->received, $_->from_address, $_->subject);
    }
And then there's the format:
    format STDOUT =
    @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<
    $_->id $_->received $_->from_address                       $_->subject
    .

    for (@mails) {
         write;
    }
Personally, I think this is much neater and more intuitive than the other two solutions—and has the bonus that it takes the formatting away from the main loop, making the code less cluttered.
Formats are associated with a particular filehandle; as you can see from the example, we've determined that this format should apply to anything we write on standard output. The picture language of formats is pretty simple: fields begin with @ or ^ and are followed by <, |, or > characters specifying left, center, and right justified respectively. After each line of fields comes a line of expressions that fill those fields, one expression for each field. If we like, we could change the format to multiple lines of fields and expressions:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Text::Template
Mark-Jason Dominus' Text::Template has established itself as the de facto standard templating system for plain text. Its templating language is very simple indeed—anything between { and } is evaluated by Perl; everything else is left alone.
It is an object-oriented module—you create a template object from a file, filehandle, or string, and then you fill it in:
    use Text::Template;
    my $template = Text::Template->new(TYPE => "FILE",
                                       SOURCE => "email.tmpl");

    my $output = $template->fill_in();
So, let's say we've got the following template:
    Dear {$who},
        Thank you for the {$modulename} Perl module, which has saved me
    {$hours} hours of work this year. This would have left me free to play
    { int($hours*2.4) } games of go, which I would have greatly appreciated
    had I not spent the time goofing off on IRC instead.

    Love,
    Simon
We set up our template object and our variables, and then we process the template:
    use Text::Template;
    my $template = Text::Template->new(TYPE => "FILE",
                                       SOURCE => "email.tmpl");

    $who = "Mark";
    $modulename = "Text::Template";
    $hours = 15;
    print $template->fill_in();
And the output would look like:
    Dear Mark,
        Thank you for the Text::Template Perl module, which has saved me
    15 hours of work this year. This would have left me free to play
    36 games of go, which I would have greatly appreciated
    had I not spent the time goofing off on IRC instead.

    Love,
    Simon
Notice that the fill-in variables—$who, $modulename, and so on—are not my variables. When you think about it, this ought to be obvious—the my variables are not in Text::Template's scope, and therefore it wouldn't be able to see them. This is a bit unpleasant: Text::Template has access to your package variables, and you have to do a bit more work if you want to avoid giving
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML::Template
HTML formatting is slightly different from plaintext formatting—there are essentially two main schools of thought. The first, used by HTML::Template, is similar to the method we saw in Text::Template; the template is stored somewhere, and a Perl program grabs it and fills it in. The other school of thought is represented by HTML::Mason, which we'll look at next; this is inside-out—instead of running a Perl program that prints out a load of HTML, you create an HTML file that contains embedded snippets of Perl and run that.
To compare these two approaches, we're going to build the same application in HTML::Template, HTML::Mason, and Template Toolkit, an aggregator of RSS (Remote Site Summary) feeds to grab headlines from various web sites and push them onto a single page. (Similar to Amphetadesk, http://www.disobey.com/amphetadesk/, and O'Reilly's Meerkat, http://www.oreillynet.com/meerkat/.) RSS is an XML-based format for providing details of individual items on a site; it's generally used for providing a feed of stories from news sites.
First, though, we'll take a brief look at how HTML::Template does its stuff, how to get values into it, and how to get HTML out.
As with Text::Template, templates are specified in separate files. HTML::Template's templates are ordinary HTML files, but with a few special tags. The most important of these is <TMPL_VAR>, which is replaced by the contents of a Perl variable. For instance, here's a very simple page:
    <html>
       <head><title>Product details for <TMPL_VAR NAME=PRODUCT></title></head>
       <body>
          <h1> <TMPL_VAR NAME=PRODUCT> </h1>
          <div class="desc">
               <TMPL_VAR NAME=DESCRIPTION>
          </div>
          <p class="price">Price: $<TMPL_VAR NAME=PRICE></p>
          <hr />
          <p>Price correct as at <TMP_VAR NAME=DATE></p>
       </body>
    </html>
When filled in with the appropriate details, this should output something like:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML::Mason
One of the big drawbacks of HTML::Template is that it forces us, to some degree, to mix program logic and presentation, something that we sought to avoid by using templates. For instance, that last template got a little difficult to follow, with variable and HTML tags crowding up the template and obscuring what was actually going on. What we would prefer, then, is a system that allows us to further abstract out the individual elements of what we expect our templates to do, and this is where HTML::Mason comes in.
As we've mentioned, HTML::Mason is an inside-out templating system. As well as templating, it could also be described as a component abstraction system for building HTML web pages out of smaller, reusable pieces of logic. Here's a brief overview of how to use it, before we go on to implement the same RSS aggregator application.
In Mason, everything is a component. Here's a simple example of using components. Suppose we have three files: test.html in Example 3-1, Header in Example 3-2, and Footer in Example 3-3.
Example 3-1. test.html
<& /Header &>
<p>
  Hello World
</p>
<& /Footer &>
Example 3-2. Header
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
      <title>Some Web Application</title>
      <link rel=stylesheet type="text/css" href="nt.css">
  </head>

<body>
Example 3-3. Footer
    <hr>
    <div class="footer">
      <address>
         <a href="mailto:webmaster@yourcompany.com">webmaster@yourcompany.com</a>

      </address>
    </div>
  </body>
</html>
HTML::Mason builds up the page by including the components specified inside <& and &> tags. When creating test.html, Mason first includes the Headercomponent found at the document root, then the rest of the HTML, then the Footer component.
Components may call other components. So far, we've done nothing outside the scope of server-side includes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Template Toolkit
While the solutions we've seen so far have been primarily for Perl programmers— embedding Perl code in some other medium—Andy Wardley's Template Toolkit (http://www.template-toolkit.org/) is slightly different. It uses its own templating language to express components, loops, method calls, data structure elements, and more; it's therefore useful for teaching to designers who have no knowledge of the Perl side of your application but who need to work on the presentation. As the documentation puts it, you should think of the Template Toolkit language as a set of layout directives for displaying data, not calculating it.
Like Mason, it seamlessly handles compiling, caching, and delivering your templates. However, unlike Mason, it's designed to provide general-purpose display and formatting capabilities in a very extensible way. As an example, you can use Template Toolkit to dynamically serve up PDF documents containing graphs based on data from a database—and all this using nothing other than the standard plugins and filters and all within the Template Toolkit mini language.
But before we look at the clever stuff, let's look at the very simple uses of Template Toolkit. In the simplest cases, it behaves a lot like Text::Template. We take a template object, feed it some values, and give it a template to process:
    use Template;
    my $template = Template->new();
    my $variables = {
        who        => "Andy Wardley",
        modulename => "Template Toolkit",
        hours      => 30,
        games      => int(30*2.4)
    };
    $template->process("thankyou.txt", $variables);
This time, our template looks like the following:
    Dear [% who %],
        Thank you for the [% modulename %] Perl module, which has saved me
    [% hours %] hours of work this year. This would have left me free to play
    [% games %] games of go, which I would have greatly appreciated
    had I not spent the time goofing off on IRC instead.

    Love,
    Simon
Lo and behold, the templated text appears on standard output. Notice, however, that our variables inside the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
AxKit
Although we include it in our list of templating systems, AxKit (http://www.axkit.org) is a slightly different kettle of fish from the modules we've seen so far; this is no mere templating system, it's a fully fledged XML application server for Apache. The most common use of AxKit is to transform XML to HTML on-the-fly for delivery over the web.
However, thanks to XSP (Extensible Server Pages), developed by the Apache Cocoon project, AxKit can be used as an extraordinarily extensible templating system. The basic idea behind XSP is that certain XML tags trigger the execution of given Perl routines. At a very basic level, you can use tags to delimit raw Perl code:
    <p>
    Good
    <xsp:logic>
    if ((localtime)[2] >= 12) {

        <i>Afternoon</i>
    }
    else {
        <i>Morning</i>
    }
    </xsp:logic>
    </p>
Notice that AxKit is quite happy for you to intersperse XML marked-up data with your Perl code. Because AxKit parses the XML, it knows that <i>Afternoon</i> is data, not Perl code, and treats it appropriately. This also means that if you have an XML guru handy, he can find a way of validating your HTML-with-embedded-XSP. In fact, since AxKit parses everything as XML, your HTML must be well-formed and valid or you won't get anything out of AxKit at all.
However, AxKit does not stop at this basic level; XSP allows you to create tag libraries with frontend Perl code. For instance, the AxKit::XSP::ESQL taglib provides a wrapper around the DBI libraries. These tag libraries define their own XML namespaces and place tags inside them. So your XML would use a namespace declaration to import the tag library:
    <xsp:page
         language="perl"
         xmlns:xsp="http://apache.org/xsp/core/v1"
         xmlns:esql="http://apache.org/xsp/SQL/v2"
    >
and this would allow you to use <esql:...> tags in your page:
      <esql:connection>
      <esql:driver>Pg</esql:driver>
      <esql:dburl>dbname=rss</esql:dburl>
      <esql:username>www</esql:username>
      <esql:password></esql:password>
      <esql:execute-query>
        <esql:query>
          select description, url, title from feeds
        </esql:query>
        <esql:results>
          <ul>
          <esql:row-results>
             <li>
              <a>
               <xsp:attribute name="href">
                   <esql:get-string column="url"/>
               </xsp:attribute>
               <esql:get-string column="name"/>
              </a> - <esql-get-string column="description"/>
             </li>
          </esql:row-results>
          </ul>
        </esql:results>

        <esql:no-results> <p> Couldn't get any results! </p> </esql:no-results>
      </esql:execute-query>
      </esql:connection>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conclusion
In this chapter, we've looked at a few of the available templating tools that are commonly used in Perl; from simple formats—sprintf, and the like—on through Text::Template and HTML::Template, and then up to the more sophisticated solutions of HTML::Mason and Template Toolkit.
But we've missed out on one quite important question: which one should you use? As usual, the answer depends partly on what you need and partly on your tastes.
First, consider the distinction between Perl-based systems like Text::Template and Text::Autoformat, and inside-out modules like HTML::Mason. If the main purpose of your program is to provide some templated output, as in the case of a web-based application, then you probably want to gravitate toward the HTML::Mason and Template Toolkit end of the spectrum.
You also need to consider who's going to be writing the templates and whether you want to expose them to Perl code. Template Toolkit, AxKit, and HTML::Template all tend to keep the templater away from Perl, whereas HTML::Mason forces the templater to get down and dirty with it.
Second, there's the element of personal taste. I'm not a great fan of HTML::Template, preferring the way Mason does things; I find AxKit very powerful but at times very frustrating because of its insistence on clean XML; and I'm beginning to like Template Toolkit the more I use it, but prefer Mason basically because I'm more used to it.
Your tastes may differ. It's just as well, that as with so many things in Perl, there's more than one way to do it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Objects, Databases, and Applications
Perl programming is all about getting some data into our program, munging it around in various ways, and then spitting it back out again. So far we've looked at some interesting ways to do the munging and some great ways to represent the data, but our understanding of storing and loading data hasn't reached the same kind of level.
In this chapter, we're going to look at four major techniques for storing and retrieving complex data, and finally at application frameworks—technologies that pull together the whole process of retrieving, modifying, and acting on data, particularly for web applications, so that all the programmer needs to deal with is the business logic specific to the application.
For each technique, there are many CPAN modules that implement it in many different ways. We only have the space to examine one module in each section to demonstrate its approach; this is not necessarily an endorsement of the module in question as the best available. After all, there's more than one way to do it.
The word database might conjure up thoughts of the DBI and big expensive servers running expensive software packages, but a database is really just anything you can get data in to and back out of.
Just a step up from the comma-separated text file is the humble DBM database. This exists as a C library in several incarnations—the most well known being the Sleepycat Berkeley DB , available from http://www.sleepycat.com/download.html, and the GNU libgdbm, from http://www.gnu.org/order/ftp.html. When Perl is compiled and installed, it supplies Perl libraries to interface with the C libraries that it finds and to the SDBM library, which is shipped along with Perl. I prefer to use the Berkeley DB, with its Perl interface DB_File .
DBMs store scalar data in key-value pairs. You can think of them as the on-disk representation of a hash, and, indeed, the Perl interfaces to them are through a tied hash:
    use DB_File;
    tie %persistent, "DB_File", "languages.db" or die $!;
    $persistent{"Thank you"} = "arigatou";

    # ... sometime later ...

    use DB_File;
    tie %persistent, "DB_File", "languages.db" or die $!;
    print $persistent{"Thank you"} # "arigatou"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Beyond Flat Files
The word database might conjure up thoughts of the DBI and big expensive servers running expensive software packages, but a database is really just anything you can get data in to and back out of.
Just a step up from the comma-separated text file is the humble DBM database. This exists as a C library in several incarnations—the most well known being the Sleepycat Berkeley DB , available from http://www.sleepycat.com/download.html, and the GNU libgdbm, from http://www.gnu.org/order/ftp.html. When Perl is compiled and installed, it supplies Perl libraries to interface with the C libraries that it finds and to the SDBM library, which is shipped along with Perl. I prefer to use the Berkeley DB, with its Perl interface DB_File .
DBMs store scalar data in key-value pairs. You can think of them as the on-disk representation of a hash, and, indeed, the Perl interfaces to them are through a tied hash:
    use DB_File;
    tie %persistent, "DB_File", "languages.db" or die $!;
    $persistent{"Thank you"} = "arigatou";

    # ... sometime later ...

    use DB_File;
    tie %persistent, "DB_File", "languages.db" or die $!;
    print $persistent{"Thank you"} # "arigatou"
DBMs, however, have a serious limitation—since they only store key-value pairs of scalar data, they cannot store more complex Perl data structures, such as references, objects, and the like. The other problem with key-value structures like DBMs is that they're very bad at expressing relationships between data. For this, we need a relational database such as Oracle or MySQL. We'll return to this subject later in the chapter to see a way of dealing with the limitations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Object Serialization
Now we want to move on from the relatively simple key-value mechanism of DBMs to the matter of saving and restoring more complex Perl data structures, chiefly objects. These data structures are interesting and more difficult than scalars, because they come in many shapes and sizes: an object may be a blessed hash—or it might be a blessed array—which could itself contain any number and any depth of nesting of hashes, including other objects, arrays, scalars, or even code references.
While we could reassemble all our data structures from their original sources every time a program is run, the more complex our structures become, the more efficient it is to be able to store and restore them wholesale. Serialization is the process of representing complex data structures in a binary or text format that can faithfully reconstruct the data structure later. In this section we're going to look at the various techniques that have been developed to do this, again with reference to their implementation in CPAN modules.
To compare the different techniques here and in the rest of the chapter, we're going to use the same set of examples: some Perl classes whose objects we want to be somehow persistent. The schema and classes are taken from the example application used by Class::DBI: a database of CDs in a collection, with information about the tracks, artists, bands, singers, and so on.
We'll create our classes using the Class::Accessor::Assert module, which not only creates constructors and accessors for the data slots we want, but also ensures that relationships are handled by constraining the type of data that goes in the slots. So, for instance, the CD class would look like this:
    package CD;
    use base "Class::Accessor::Assert";
    _ _PACKAGE_ _->mk_accessors(qw(
       artist=CD::Artist title publishdate=Time::Piece songs=ARRAY
    ));
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Object Databases
While the methods we've seen in the previous section work very well for storing and retrieving individual objects, there are times when we want to deal with a massive collection of data with the same degree of efficiency. For instance, our CD collection may run to thousands of objects, while a simple query application—for example, to determine which artist recorded a particular track—would only use one or two of them. In this case, we don't want to load up the whole object store into memory before we run the query.
In fact, what we could really deal with is the kind of fast, efficient indexing and querying that is the hallmark of traditional relational databases such as Oracle or MySQL, but which dealt with objects in the same way as Pixie. We want an object database.
There are not many object databases on CPAN, and with good reason: writing object databases is incredibly difficult.
First, you need to worry about how to pick apart individual objects and store them separately, so that you don't end up with the pruning problem.
Second, you have to work out a decent way to index and query objects. Indexing and querying database rows in general is pretty easy, but objects? This is currently one of the areas that holds Pixie back from being an object database.
Allied with that, you need to work out how you're going to map the properties of your object to storage in a sensible way to allow such indexing; serialization-based solutions don't care about what's inside an object, they just write the whole thing into a string.
Fortunately, you don't really have to worry about these things; you can just use some of the existing solutions.
Jean-Louis Leroy's Tangram is a mature and flexible but complex solution to mapping Perl objects onto database rows. Tangram is very explicit in terms of what the user must do to make it work. Except when it comes to filters, which we'll look at in a moment, Tangram is very short on DWIM.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Database Abstraction
Tangram has given us a way to store and retrieve objects in a database. The other side of the coin is the situation of having an existing database and wanting to get a view of it in terms of Perl objects. This is a very subtle distinction, but an important one. In the case of Tangram (and indeed, Pixie), we didn't really care what the database schema was, because the database was just an incidental way for Tangram to store its stuff. It could create whatever tables and columns it wanted; what we really care about is what the objects look like. In the current case, though, we already have the database; we have a defined schema, and we want the database abstraction tool to work around that and tell us what the objects should look like.
There are several good reasons why you might want to do this. For many people, database abstraction is attractive purely because it avoids having to deal with SQL or the relatively tedious process of interacting with the DBI; but there's a more fundamental reason.
When we fetch some data from the database, in the ordinary DBI model, it then becomes divorced from its original database context. It is no longer live data. We have a hash reference or array reference of data—when we change elements in that reference, nothing changes in the database at all. We need a separate step to put our changes back. This isn't the paradigm we're used to programming in. We want our data to do something, and data that do something are usually called objects—we want to treat our database rows as objects, with data accessors, instantiation and deletion methods, and so on. We want to map between relational databases and objects, and this is called, naturally, object relational mapping.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Practical Uses in Web Applications
One of the more popular ways of creating web-based applications these days is called the MVC Pattern—it's a design pattern where you have three components: a model of your data, a view that displays it, and a controller that routes requests and actions between the other two. It's a design pattern that first appeared in graphical applications in the Smalltalk programming language, but has translated reasonably well over to the Web. The key point of MVC is that, if you do it properly, your data model, your view, and your controller can be completely independent components, and you only need to worry about what goes on at the edges.
Now, the kind of templating system we looked at in the previous chapter looks very much like a view class: it abstracts out a way of presenting data. Similarly, the ways of treating database rows as objects look very much like model classes. Almost for free, using CPAN modules, we've got two of the three parts we need for a web application. The upshot is that, if you follow the MVC strategy, you have a very cheap way of writing web applications in which you delegate presentation to a templating library, you delegate data representation to an ORM library, and all you need to care about is what the darned thing actually does.
While this strategy can be applied to pretty much any of the tools we've talked about in the past two chapters, I want to look particularly at using Class::DBI and Template Toolkit; partly for the sake of example, partly because I personally think they fit together extremely well, and partly for another reason that will become apparent shortly.
The magic coupling of CDBI and TT, as they're affectionately known, was first popularized around 2001 by Tony Bowden, who'd just taken over maintaining Class::DBI. The idea spread through the mailing lists and Perl-mongers groups until, in 2003, Kate Pugh wrote a perl.com article (http://www.perl.com/lpt/a/2003/07/15/nocode.html
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conclusion
Storing and retrieving data is the backbone of programming, so it shouldn't be much of a surprise that there are so many techniques available to make it easier. We've looked at ways of storing keyed data using DBMs, extended that with serialization of objects to create a way to store objects in DBMs, then used Pixie to organize our object store. This brought us on to looking at Tangram as a more flexible and powerful object database. Next, we turned the problem over and tried to make databases look like objects, using Class::DBI. Finally, we showed how this view of databases works in concert with the templating techniques we looked at in Chapter 3 to create application frameworks like Maypole, allowing you to write large web applications with very little code.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Natural Language Tools
Content preview·Buy PDF of this chapter|