Chapter 5. Profiling Perl

Before I can do anything to improve my programs, I have to make a decision about what I am going to fix. Before I spend the time to do that, I want to figure out what I should focus on. How do I get the most improvement for the least amount of fiddling? What should I work on first? Through the process of profiling, by which I record and summarize what a program is doing, I can make those decisions. Luckily, Perl already offers several tools to do this.

Finding the Culprit

I want to compute a factorial. It’s the old saw of performance discussions, and I’ll get to something more interesting in a moment. When I Googled for “factorial subroutines,” almost every implementation (aside from those in assembly language) was a recursive algorithm, meaning that a subroutine had to figure out part of the problem, then call itself with a subproblem, and keep doing that until there are no more subproblems, eventually working its way up to the original call. Here’s how I’d write that in Perl:

#!/usr/bin/perl
# factorial-recurse.pl

sub factorial
        {
        return unless int( $_[0] ) == $_[0];
        return 1 if $_[0] == 1;
        return $_[0] * factorial( $_[0] - 1 );
        }

print factorial( $ARGV[0] ), "\n";

Now I want to figure out how to improve this toy program. It’s already pretty fast because Perl can’t really count that high. With anything over 170, my program on my machine returns Inf (more on that in a moment). Despite that, I’ll profile it anyway. I use the Devel::SmallProf module to get a quick summary. I invoke it with the -d switch, which already assumes the Devel portion of the name (see Chapter 4):

% perl -d:SmallProf factorial.pl 170

The Devel::SmallProf module leaves behind a human-readable text file named smallprof.out. In its columnar output, it shows each line of the program, how many times I executed that line, and the real and CPU times for each line:

   ================ SmallProf version 1.15 =========================
                    Profile of factorial.pl                    Page 1
   =================================================================
    count wall tm  cpu time line
        0 0.000000 0.000000     1:#!/usr/bin/perl
        0 0.000000 0.000000     2:
      170 0.000000 0.000000     3:sub factorial {
      170 0.001451 0.000000     4: return unless int( $_[0] ) == $_[0];
      170 0.004367 0.000000     5: return 1 if $_[0] == 1;
      169 0.004371 0.000000     6: return $_[0] * factorial( $_[0] - 1 );
        0 0.000000 0.000000     7: }
        0 0.000000 0.000000     8:
        1 0.000009 0.000000     9:print factorial( $ARGV[0] ), "\n";

To compute the factorial of 170, I had to call the subroutine 170 times. Each time (save for one!) I called that subroutine, I had to execute the lines in the subroutine. I had to check that the argument was an integer each time, I had to check if the argument was 1 each time, and in almost every case, I had to call the subroutine again. That’s a lot of work. By profiling my program, I can see what is taking up all the time, and then concentrate on improving those areas.

The best way to fix these problems is to come up with a better way to get the answer. Better algorithms get you better performance than almost any other method. Instead of using a recursive solution, I changed it to an iterative one. I can easily get the range of integers using the range operator, and in other languages, a C style for loop can stand in:

#!/usr/bin/perl
# factorial-iterate.pl

sub factorial {
                return unless int( $_[0] ) == $_[0];
                my $f = 1;
                foreach ( 2 .. $_[0] ) { $f *= $_ };
                $f;
                }

print factorial( $ARGV[0] ), "\n";

When I profile this program, I see that I did not have to do as much work. I didn’t have as much code to run. I only had to check the argument once, I didn’t have to check if the argument was 1, and I don’t have to make repeated calls to a subroutine:

   ================ SmallProf version 1.15 =========================
                    Profile of factorial2.pl                    Page 1
   =================================================================
    count wall tm  cpu time line
        0 0.000000 0.000000     1:#!/usr/bin/perl
        0 0.000000 0.000000     2:
        1 0.000000 0.000000     3:sub factorial {
        1 0.000021 0.000000     4: return unless int( $_[0] ) == $_[0];
        1 0.000000 0.000000     5: my $f = 1;
      170 0.001632 0.000000     6: foreach ( 2 .. $_[0] ) { $f *= $_ };
        1 0.002697 0.000000     7: $f;
        0 0.000000 0.000000     8: }
        0 0.000000 0.000000     9:
        1 0.000006 0.000000    10:print factorial( $ARGV[0] ), "\n";

Earlier I said that my program topped out at 170. I can get past that limit by telling Perl to use the bignum pragma:

#!/usr/bin/perl
# factorial-recurse-bignum.pl

use bignum;

sub factorial {
        return unless int( $_[0] ) == $_[0];
        return 1 if $_[0] == 1;
        return $_[0] * factorial( $_[0] - 1 );
        }

print factorial( $ARGV[0] ), "\n";

Now I can see some real performance differences by comparing the factorials of really big numbers. As I was finishing this book, I switched to a MacBook Pro and its dual core architecture had no problem with speed in either of the approaches. Only with really large numbers did the recursive approach really slow down.

That’s not the whole story, though. I’ve shown a really simple program that calculates a single number. In a real program I would most likely use the factorial routine many, many times with several different values. When I profile the application, I’ll see the number of times I run the lines of the subroutine throughout the entire process.

Either approach can benefit from caching its results. Here’s a program that repeatedly prompts me for a number. It computes the factorial and caches the results along the way, trading memory for speed. The first time I ask it to compute the factorial for 10,000, it takes several seconds. After that, when I ask it for the factorial for any number less than 10,000, it’s just a very fast lookup:

#!/usr/bin/perl
# factorial-iterate-bignum-memo.pl

use bignum;

{
my @Memo      = (1);

sub factorial {
        my $number = shift;

        return unless int( $number ) == $number;
        return $Memo[$number] if $Memo[$number];

        foreach ( @Memo .. $number )
                {
                $Memo[$_] = $Memo[$_ - 1] * $_;
                }

        $Memo[ $number ];
        }
}

{
print "Enter a number> ";
chomp( my $number = <STDIN> );
exit unless defined $number;

print factorial( $number ), "\n";
redo;
}

I can do the same with the recursive solution, although the Memoize module does the extra work for me:

#!/usr/bin/perl
# factorial-recurse-bignum-memo.pl

use bignum;

use Memoize;

memoize( factorial );

sub factorial {
        return unless int( $_[0] ) == $_[0];
        return 1 if $_[0] == 1;
        return $_[0] * factorial( $_[0] - 1 );
        }

{
print "Enter a number> ";
chomp( my $number = <STDIN> );
exit unless defined $number;

print factorial( $number ), "\n";
redo;
}

While profiling, I must remember that some things in isolation don’t tell me the whole story. The profile can help me make decisions, but I’m the one who has to do the thinking, not the computer.

The General Approach

Profiling means counting, and to count something, I need to make the statements do something so I can count them. I might, for instance, use some of the features from Chapter 4 to add accounting code to my subroutines. That’s much too tedious, though. Instead of trying to account for things in individual subroutines, I try to make everything flow through a single control subroutine. This is probably too much for a small program, but in a large system, the extra computing pays off in saved developer time when I work on optimizing the program.

The most common place I do this is in database code. In the database case, I want to track with queries I make, usually so I can get an idea of which queries take a long time or which ones I most frequently use. From that, I can figure out what I should optimize.

Here’s an example of a nexus for all queries that allows me to profile my database code. I’ve simplified this example, but this is close to some actual code I’ve used, minus some stuff that doesn’t apply to profiling. I have a package-scoped lexical variable %Queries that will hold my profile data. The simple_query method is essentially a wrapper around prepare and execute with some accounting overhead:

package My::Database;

my %Queries;

sub simple_query
        {
        my( $self, @args ) = @_;

        my $sql_statement = shift @args;

        $Queries{$sql_statement}++;  # <--- Profiling hook

        my $sth = $self->dbh->prepare( $sql_statement );
        unless( ref $sth ) { warn $@; return }

        my $rc   = $sth->execute( @args );

        wantarray ? ( $sth, $rc ) : $rc;
        }

In the rest of my database code, I have functions that use simple_query instead of using the DBI interface directly. My get_postage_rates_by_country grabs the amount of postage I need to send mail overseas. It passes the SQL statement and a bind parameter to simple_query. As before, this is real code, although I’ve cut out some bits to only show the relevant parts:

sub get_postage_rates_by_country
        {
        my( $self, $country ) = @_;

        my( $sth ) = $self->simple_query( <<"SQL", $country );
                SELECT
                        PostageRates.ounces,
                        PostageRates.rate,
                        PostageServices.name
                FROM
                        PostageRates, Countries, PostageServices
                WHERE
                        Countries.pk = ?
                AND
                        Countries.usps_zone = PostageRates.usps_zone
                AND
                        PostageRates.service = PostageServices.pk
                ORDER BY
                        PostageRates.ounces
SQL

        return $sth->fetchall_arrayref;
        }

As my program does its work, the queries flow through simple_query, which counts and records what happens. To get the profile data, I use an END block to create the report. The particular format depends on what I collected during the run. In this example I just counted statements, but I could use that %Queries hash to store anything I wanted, including the bind parameters, the function that called simple_query, and so on:

END {
        foreach my $statement ( sort { $b <=> $a } keys %Queries )
                {
                printf "%5d %s\n\n", $Queries{$statement}, $statement;
                }
    }

I might find, for instance, that in a long report that I repeatedly fetch the postage data for each country, even though it’s not going to change. When I realize I’m doing this after looking at the profile data, I can optimize my code to cache some of the data in memory rather than asking for the same answer in the database.

I’ve actually been coding my Perl database stuff like this for quite a while, and I recently found out that Tim Bunce added these features directly to DBI. He did the same sort of thing by making everything flow through a central function. That was really easy because DBI already does that for queries.

Profiling DBI

The DBI::Profile module can do much of the same work but for my use of Perl’s database interface module, DBI. Database interactions are often the biggest performance drain on my programs, and that’s a place I usually start to look for improvements. Instead of calling subroutines unnecessarily, as in my last example, I might be making unnecessary database queries.

Here’s a short program that takes quite a bit of time because it makes almost 2,000 database queries. I want to build a table of number names, so given a digit I can get the name (e.g., 9 has the name “Nine”), or go from the name to the digit. I should probably use a Lingua::* module, but then I don’t want to start off with something smart. In this example, I use the DBD::CSV module to use a comma-separated value file as my database store. I create a table to hold the pairs, and then start to populate the table. I bootstrap the data by getting the first 19 names into the table, then looking up the names I already have to create further names:

#!/usr/bin/perl
# dbi-number-inserter.pl
use strict;

use DBI;

my $dbh = DBI->connect( "DBI:CSV:f_dir=." );

$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );

my $sth = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );

my $id = 1;
foreach my $name (
        qw(One Two Three Four Five Six Seven Eight Nine Ten),
        qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
                Nineteen)
        )
        {
        $sth->execute( $id++, $name );
        }

foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
        {
        $sth->execute( $id++, $name );

        foreach my $ones_digit ( 1 .. 9 )
                {
                my( $ones_name ) = map { lc } $dbh->selectrow_array(
                        "SELECT name FROM names WHERE id = $ones_digit"
                        );
                $sth->execute( $id++, "$name $ones_name" );
                }
        }

foreach my $digit ( 1 .. 9 )
        {
        my( $hundreds ) = $dbh->selectrow_array(
                "SELECT name FROM names WHERE id = $digit"
                );

        $sth->execute( $id++, "$hundreds hundred" );

        foreach my $tens_digit ( 1 .. 99 )
                {
                my( $tens_name ) = map { lc } $dbh->selectrow_array(
                        "SELECT name FROM names WHERE id = $tens_digit"
                        );
                $sth->execute( $id++, "$hundreds hundred $tens_name" );
                }
        }

I run this from the command line, and it takes almost two minutes on my Powerbook G4. That’s okay; I need a nice, slow example. Now I want to profile this program to see where I can improve it, pretending I was just handed it without knowing how it works. I set the DBI_PROFILE environment variable to turn on database profiling.^[23]To get a report ordered by statements, I set DBI_PROFILE='!Statement'. The sort key has an exclamation point, !, prepended to it. At the end of the run, I get a long report. Here are the first several lines:

$ env DBI_PROFILE='!Statement' perl dbi-profile.pl
DBI::Profile: 109.671362s 99.70% (1986 calls) dbi-profile.pl @ 2006-10-10 02:18:40
'' =>
        0.000784s / 10 = 0.000078s avg (first 0.000023s, min 0.000001s, max 0.000618s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
        0.004258s
'DROP TABLE names' =>
        0.008017s
'INSERT INTO names VALUES ( ?, ? )' =>
        3.229462s / 1002 = 0.003223s avg (first 0.001767s, min 0.000037s, max 0.108636s)
'SELECT name FROM names WHERE id = 1' =>
        1.204614s / 18 = 0.066923s avg (first 0.012831s, min 0.010301s, max 0.274951s)
'SELECT name FROM names WHERE id = 10' =>
        1.118565s / 9 = 0.124285s avg (first 0.027711s, min 0.027711s, max 0.341782s)
'SELECT name FROM names WHERE id = 11' =>
        1.136748s / 9 = 0.126305s avg (first 0.032328s, min 0.032328s, max 0.378916s)

The top line gives me the wallclock time and the total number of DBI method calls; that’s the number of method calls to DBI, not the number of queries. After that, I get a report for each query, in lexical order. Just because it looks like it’s sorted by total time or number of queries, don’t forget to look at the rest of the report. It’s actually sorted in alphabetical order of the query.

For each query, DBI::Profile reports the total wallclock time and the number of method calls for that statement. It doesn’t report the CPU time because it isn’t very interesting; the database server might be another machine, and even if it is local, it’s often a separate process. It gives an average time for that query, and then the times for the first call, the call that took the least amount of time, and the call that took the most. This isn’t as simple as timing a program. The database server might perform differently given the same input because it might be doing something else, the data size might be different, or many other things.

From the full report, I see that most calls took about the same amount of time since they are all running pretty quickly, so I can’t make a big speedup by optimizing a query so it performs better on the database. No indexing or rearrangement of joins will likely help here.

What I really need to reduce is the number of queries so I interact with the database less. I can’t get away from the INSERTs since I still have to make each row, but I don’t need to make all of those select statements. I should cache the result so I don’t fetch the same data twice (or even at all):

#!/usr/bin/perl
# dbi-number-inserter-cached.pl
use strict;

use DBI;

my $dbh = DBI->connect( "DBI:CSV:f_dir=." );

$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );

my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );


my @array = ( qw( Zero ),
        qw(One Two Three Four Five Six Seven Eight Nine Ten),
        qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
                Nineteen)
        );

my $id = 0;
foreach my $name ( @array )
        {
        $insert->execute( $id++, $name );
        }

foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
        {
        $array[ $id ] = $name;
        $insert->execute( $id++, $name );
        foreach my $ones_digit ( 1 .. 9 )
                {
                my $full_name = $array[ $id ] = "$name $array[$ones_digit]";
                $insert->execute( $id++, $full_name );
                }
        }

foreach my $digit ( 1 .. 9 )
        {
        my( $hundreds ) = $array[ $digit ];
        my $name = $array[$id] = "$hundreds hundred";
        $insert->execute( $id++, $name );

        foreach my $tens_digit ( 1 .. 99 )
                {
                my( $tens_name ) = lc $array[ $tens_digit ];
                $array[$id] = "$hundreds hundred $tens_name";
                $insert->execute( $id++, "$name $tens_name" );
                }
        }

In my first pass at improvement, I don’t have any select statements at all because I cache the results. That cuts out most of the runtime in this program. The times for each program are remarkably different. Remember, however, that I’ve made a trade-off between speed and memory. The second program is faster, but it takes up more memory:

$ time perl dbi-profile.pl
real    1m48.676s
user    1m21.136s
sys     0m1.698s

$ time perl dbi-profile2.pl
real    0m2.638s
user    0m1.736s
sys     0m0.307s

Here’s the entire profile report for my new program, which now runs in two percent of the original runtime. Most of the calls are INSERTs:

$ env DBI_PROFILE='!Statement' perl dbi-profile2.pl
DBI::Profile: 2.118577s 105.93% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:31:10
'' =>
        0.000757s / 10 = 0.000076s avg (first 0.000021s, min 0.000001s, max 0.000584s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
        0.004216s
'DROP TABLE names' =>
        0.006906s
'INSERT INTO names VALUES ( ?, ? )' =>
        2.106698s / 1003 = 0.002100s avg (first 0.001713s, min 0.000037s, max 0.005587s)

By looking at the profile, I was able to target part of the program for improvement. It didn’t tell me how to improve it, but at least I know where I should spend my time.

Other DBI::Profile Reports

The runtime report isn’t the only one I can get. With DBI_PROFILE='!MethodName', DBI orders the report according to the name of the DBI function. It’s in ASCII-betical order with the uppercase letters sorting before the lowercase ones (and I’ve redacted part of these reports since they shows all of the methods, including the ones I didn’t even know I was using):

$ env DBI_PROFILE='!MethodName' perl dbi-profile2.pl
DBI::Profile: 2.168271s 72.28% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:37:16
'DESTROY' =>
        0.000141s / 2 = 0.000070s avg (first 0.000040s, min 0.000040s, max 0.000101s)
'FETCH' =>
        0.000001s
'STORE' =>
        0.000067s / 5 = 0.000013s avg (first 0.000022s, min 0.000006s, max 0.000022s)
'do' =>
        0.010498s / 2 = 0.005249s avg (first 0.006602s, min 0.003896s, max 0.006602s)
'execute' =>
        2.155318s / 1000 = 0.002155s avg (first 0.002481s, min 0.001777s, max 0.007023s)
'prepare' =>
        0.001570s

I can even combine the two since DBI::Profile can deal with multiple sort keys if I join them with a colon. With DBI_PROFILE='!Statement:!MethodName', DBI gives me a double layer report. Under each SQL statement, it breaks the time down by the particular function it used. I might, for instance, want to compare the time my database query spends in the DBI guts and actually fetching the data:

$ env DBI_PROFILE='!Statement:!MethodName' perl dbi-profile2.pl
DBI::Profile: 2.123325s 106.17% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:38:22
'' =>
        'FETCH' =>
                0.000001s
        'STORE' =>
                0.000069s / 5 = 0.000014s avg (first 0.000024s, min 0.000005s,↲
                max 0.000024s)
        'connect' =>
                0.000644s
        'default_user' =>
                0.000030s
        'disconnect' =>
                0.000050s
        'disconnect_all' =>
                0.000024s
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
        'do' =>
                0.004616s
'DROP TABLE names' =>
        'do' =>
                0.007191s
'INSERT INTO names VALUES ( ?, ? )' =>
        'DESTROY' =>
                0.000149s / 2 = 0.000075s avg (first 0.000050s, min 0.000050s,↲
                max 0.000099s)
        'execute' =>
                2.108945s / 1000 = 0.002109s avg (first 0.002713s, min 0.001796s,↲
                max 0.005454s)
        'prepare' =>
                0.001606s

I can flip that last report around by using DBI_PROFILE='!MethodName:!Statement'. The first layer lists the DBI method and then breaks it down by SQL statements after that:

$ env DBI_PROFILE='!MethodName:!Statement' perl dbi-profile2.pl
DBI::Profile: 2.431843s 81.06% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:40:40
'DESTROY' =>
        'INSERT INTO names VALUES ( ?, ? )' =>
                0.000142s / 2 = 0.000071s avg (first 0.000039s, min 0.000039s,↲
                max 0.000103s)
'FETCH' =>
        '' =>
                0.000001s
'STORE' =>
        '' =>
                0.000065s / 5 = 0.000013s avg (first 0.000022s, min 0.000005s,↲
                max 0.000022s)
'connect' =>
        '' =>
                0.000685s
'default_user' =>
        '' =>
                0.000024s
'disconnect' =>
        '' =>
                0.000050s
'disconnect_all' =>
        '' =>
                0.000023s
'do' =>
        'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
                0.004287s
        'DROP TABLE names' =>
                0.006389s
'execute' =>
        'INSERT INTO names VALUES ( ?, ? )' =>
                2.418587s / 1000 = 0.002419s avg (first 0.002549s, min 0.001819s,↲
                max 0.013104s)
'prepare' =>
        'INSERT INTO names VALUES ( ?, ? )' =>
                0.001589s

Making It Even Easier

Sam Tregar’s DBI::ProfileDumper module does the same thing as DBI::Profile, but it saves its result in a file instead of dumping it to standard output.^[24]By default, this file is named dbi.prof, but I can use any name I like. For anything but a small application, I’ll probably have to do quite a bit of custom slicing and dicing to extract the information I need.

First, I tell DBI which profiling class it should use by including it in the DBI_PROFILE value. I join the class name to the profiling sort keys with a /:

$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper ./program.pl

Once that command completes, dbi.prof has all of the profiling data. If I want to change the filename, I just add that to DBI_PROFILE by appending it after the class name:

$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper/File:dbi.prof ./program.pl

Once I have the data, I can analyze them with dbiprof, which has several options to select the data to display, sort it in the way that I want (even on multiple keys), and many other things:

$ dbiprof --number all --sort longest

Switching Databases

I started with a pretty bad program that made many unnecessary calls to the database and did quite a bit of work. I can make that program more Perly, though, by using Perl’s list operators smartly. Instead of all that index counting, I use push to put things onto an array. The code is much tighter and shorter, and it does the same thing. Instead of inserting items as I go along, I move all the database stuff to the end (I have secret plans for that later), but for now the program runs in about the same time as the previous example:

#!/usr/bin/perl
# dbi-number-inserter-end.pl
use strict;

use DBI;

my @array = ( qw( Zero ),
        qw(One Two Three Four Five Six Seven Eight Nine Ten),
        qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
                Nineteen)
        );

foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
        {
        push @array, $name;
        push @array, map { "$name $array[$_]" } 1 .. 9
        }

foreach my $digit ( 1 .. 9 )
        {
        push @array, "$array[$digit] hundred";
        push @array, map { "$array[$digit] hundred $array[$_]" } 1 .. 99;
        }

my $dbh = DBI->connect( "DBI:CSV:f_dir=." );

$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );

my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );

foreach my $index ( 0 .. $#array )
        {
        $insert->execute( $index, $array[$index] );
        }

Instead of using a CSV file though, I now want to use a more sophisticated database server since I think I might be able to get better performance in writing all of this stuff to disk. I have the tools to find out, so why not? I’ll use SQLite, another lightweight server that DBI can talk to. I don’t have to change too much in my program since DBI hides all of that for me. I only change the DBI connection:

# dbi-number-inserter-sqlite.pl
my $dbh = DBI->connect( "DBI:SQLite:dbname=names.sqlite.db" );

When I run my program again, it’s abysmally slow. It takes a lot longer to insert all of these rows into an SQLite database store:

$ time perl dbi-profile-sqlite.pl
real    5m7.038s
user    0m0.572s
sys     0m8.220s

That’s awful! When I profile the program, I see that the INSERT statements take 100 times longer than my previous programs. Outrageous!

% DBI_PROFILE='!Statement' perl dbi-profile-sqlite.pl
DBI::Profile: 266.582027s 99.63% (1015 calls) dbi-profile-sqlite.pl @ 2006-03-22↲
17:19:51
'' =>
        0.004483s / 10 = 0.000448s avg (first 0.000007s, min 0.000003s, max 0.004329s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
        0.413145s
'DROP TABLE names' =>
        0.294514s
'INSERT INTO names VALUES ( ?, ? )' =>
        265.869885s / 1003 = 0.265075s avg (first 0.000306s, min 0.000016s,↲
        max 0.771342s)

But this is a well-known issue with SQLite and some other databases because they automatically commit each query and wait for the data to make it to the physical disk before moving on. Instead of inserting every row individually, I can do that in a transaction. I don’t have to actually write to the database for every INSERT. I’ll do it all at once when I COMMIT:

# dbi-profile-sqlite-transaction.pl
$dbh->do( "BEGIN TRANSACTION" );
foreach my $index ( 0 .. $#array )
        {
        $insert->execute( $index, $array[$index] );
        }
$dbh->do( "COMMIT" );

Now the profile looks much different. Looking at the results, I can see that I’ve improved the insertion time by orders of magnitude, and now it’s faster, by all measures, than any of the previous programs that did the same thing:

%  DBI_PROFILE='!Statement' perl dbi-profile-sqlite2.pl
DBI::Profile: 1.334367s 54.19% (1016 calls) dbi-profile-sqlite2.pl @ 2006-03-22 17:25:44
'' =>
        0.004087s / 9 = 0.000454s avg (first 0.000007s, min 0.000003s, max 0.003950s)
'BEGIN TRANSACTION' =>
        0.000257s
'COMMIT' =>
        0.255082s / 2 = 0.127541s avg (first 0.254737s, min 0.000345s, max 0.254737s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
        0.271928s
'DROP TABLE names' =>
        0.715443s
'INSERT INTO names VALUES ( ?, ? )' =>
        0.087570s / 1002 = 0.000087s avg (first 0.000317s, min 0.000004s, max 0.003396s)

Devel::DProf

I started this chapter with Devel::SmallProf only because I get to the results faster. Devel::DProf does much of the same thing, but stores the results in its own format so it can do several things with them later, such as make pretty code graphs. I call it in the same way by using the -d switch.

I have a program that reads the Use.Perl^[25]journal entries through its SOAP interface. I run it with the -d switch and tell it to use Devel::DProf as the debugging module:

% perl -d:DProf journals

Once I’ve run the program, I have a new file named tmon.out, although I can change that with the PERL_DPROF_OUT_FILE_NAME environment variable. This file isn’t human-readable, so I need to use dprofpp to turn it into something that I can use. The wallclock time is 53 seconds, although I spent less than a second in the CPU. I could improve parts of the program, but the network latency and download times will still dominate the time:

$ dprofpp
LWP::Protocol::collect has 17 unstacked calls in outer
Compress::Zlib::__ANON__ has 5 unstacked calls in outer
...snipped several more lines like these...
HTTP::Message::__ANON__ has 8 unstacked calls in outer
Total Elapsed Time = 53.08383 Seconds
  User+System Time = 0.943839 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 8.37   0.079  0.000     84   0.0009 0.0000  utf8::SWASHNEW
 6.25   0.059  0.146      5   0.0118 0.0292  main::BEGIN
 5.83   0.055  0.073     24   0.0023 0.0030  Text::Reform::form
 5.09   0.048  0.067      2   0.0242 0.0334  HTTP::Cookies::Netscape::load
 4.24   0.040  0.040     10   0.0040 0.0040  LWP::UserAgent::BEGIN
 4.24   0.040  0.049      9   0.0044 0.0054  Text::Autoformat::BEGIN
 3.71   0.035  0.035    697   0.0001 0.0001  HTTP::Headers::_header
 3.18   0.030  0.030      8   0.0037 0.0037  DynaLoader::dl_load_file
 3.18   0.030  0.079      2   0.0149 0.0393  Text::Template::GEN0::BEGIN
 3.18   0.030  0.068     17   0.0017 0.0040  LWP::Protocol::implementor
 2.65   0.025  0.045    221   0.0001 0.0002  SOAP::SOM::_traverse_tree
 2.54   0.024  0.024    892   0.0000 0.0000  HTTP::Cookies::set_cookie
 2.44   0.023  0.023   1060   0.0000 0.0000  Text::Reform::_debug
 2.44   0.023  0.061    293   0.0001 0.0002  SOAP::SOM::_traverse
 2.12   0.020  0.030      5   0.0040 0.0059  HTTP::Cookies::BEGIN

If I don’t want to sort on execution time, dprofpp offers other options. With -l (lowercase L), I get the output in order of most frequently called subroutine:

$ dprofpp -l
LWP::Protocol::collect has 17 unstacked calls in outer
Compress::Zlib::__ANON__ has 5 unstacked calls in outer
... snip ...
HTTP::Message::__ANON__ has 8 unstacked calls in outer
Total Elapsed Time = 53.08383 Seconds
  User+System Time = 0.943839 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 1.06   0.010  0.010   1525   0.0000 0.0000  UNIVERSAL::isa
 0.21   0.002  0.002   1184   0.0000 0.0000  LWP::Debug::debug
    -       - -0.009   1156        -      -  SOAP::Data::new
 2.44   0.023  0.023   1060   0.0000 0.0000  Text::Reform::_debug
 2.54   0.024  0.024    892   0.0000 0.0000  HTTP::Cookies::set_cookie
    -       - -0.005    753        -      -  SOAP::Serializer::new
    -       - -0.014    734        -      -  SOAP::Serializer::__ANON__
 3.71   0.035  0.035    697   0.0001 0.0001  HTTP::Headers::_header
    -       -  0.000    527        - 0.0000  HTTP::Message::__ANON__
 0.74   0.007  0.007    439   0.0000 0.0000  UNIVERSAL::can
 0.64   0.006  0.004    425   0.0000 0.0000  HTTP::Headers::__ANON__
    -       - -0.002    382        -      -  SOAP::Utils::o_lattr
    -       - -0.002    369        -      -  SOAP::Trace::__ANON__
    -       - -0.002    323        -      -  HTTP::Message::_elem
 0.64   0.006  0.024    323   0.0000 0.0001  HTTP::Headers::push_header

With dprofpp’s -T switch, I can see the chain of subroutine calls. My journals program made about 25,000 subroutine calls in all (it didn’t really seem that complicated when I wrote it), and here’s a short section of the output where I see the SOAP module doing its work:

$ dprofpp -T
... snip ...
SOAP::Serializer::encode_object
   SOAP::Serializer::multiref_object
          SOAP::Serializer::gen_id
          SOAP::Serializer::__ANON__
                 SOAP::Serializer::new
   UNIVERSAL::isa
   UNIVERSAL::isa
   SOAP::Serializer::encode_scalar
          SOAP::XMLSchema::Serializer::xmlschemaclass
          SOAP::Serializer::maptypetouri
          SOAP::Serializer::encode_object
                 SOAP::Serializer::multiref_object
                        SOAP::Serializer::gen_id
                        SOAP::Serializer::__ANON__
                           SOAP::Serializer::new
... snip ...

I should also note, however, the Devel::DProf sometimes runs into problems and produces segmentation faults. In that case, I can use Devel::Profiler, which is a pure Perl replacement for Devel::DProf, although slower.

Writing My Own Profiler

The Devel::SmallProf module from the first examples isn’t really all that complicated. When I look under the hood, I don’t see that much code. It’s very easy, in fact, to write my own profiler.^[26]

Devel::LineCounter

I’m going to create a profiler to simply count the number of times Perl sees a certain line of code during the runtime. The Devel::SmallProf module already does that for me, but once I know how to program the basics myself, I can adapt it to just about anything that I want to do.

When I run my program with the -d switch, for each statement Perl calls the special subroutine &DB::DB (the default Perl debugger is just a program and works in much the same way). That’s the subroutine named DB in the package DB. I can do whatever I like in that subroutine, so I’m going to accumulate that count of the number of times I’ve seen that line of code:

package Devel::LineCounter;

package DB;
use strict;
use warnings;

my @counter = ();

sub DB
        {
        my( $file, $line ) = ( caller )[1,2];

        next unless $file eq $0;

        $counter[$line]++;
        }

To get profiling output without changing my original program, I add an END block to my LineCounter module. That will execute somewhere near the very end of the program, although it might execute before other END blocks:

END
        {
        print "\nLine summary for $0\n\n";

        open FILE, $0 or die "Could not open $0\n$!";

        while( <FILE> )
                {
                printf "%6d %s", $counter[++$count] || 0, $_;
                }
        }

I store my new module in the right place (i.e., Devel/LineCounter.pm somewhere in Perl’s module search path), and then invoke it with the -d switch:

% perl -d:LineCounter factorial.pl

Profiling Test Suites

The Devel::Cover module profiles my test suites to tell me how much of the code base they actually test. It counts the number of times the test suite runs a line of code, as well as keeping track of which code branches I follow. Ideally, my test should touch every line of code and exercise every possible set of conditions.

Devel::Cover

Devel::Cover comes with the cover command, which reports the coverage statistics for code. To use it, I first clear anything it might have done before. I don’t really need to clear the coverage data. I might want to add the current run data to what I’ve already done, or to other coverage databases for other parts of the project:

$ cover -delete

Once I’ve cleared the coverage database, I run my program after loading Devel::Cover. The conventional make test invocation uses Test::Harness to run each test program, so I tell Test::Harness to load Devel::Cover by setting HARNESS_PERL_SWITCHES with additional information for the command line to call each test program:

$ HARNESS_PERL_SWITCHES=-MDevel::Cover make test

If I’m using Module::Build instead of ExtUtils::MakeMaker, I don’t have to do so much work:

$ ./Build testcover

Just as with the other Devel:: modules, Devel::Cover watches as the code runs and uses a lot of black magic to decide what’s going on. It stores all of this information in a directory named cover_db.

Finally, the cover command turns all of that data into something that I can understand, writing a summary to STDOUT and creating a detailed report in coverage.html. The HTML file links to several other HTML files, allowing me to drill down into the program. Here’s a run that analyzes my HTTP::Size module:

$ cover
Reading database from /Users/brian/Dev/HTTP/Size/cover_db


---------------------------- ------ ------ ------ ------ ------ ------ ------
File                           stmt branch   cond    sub    pod   time  total
---------------------------- ------ ------ ------ ------ ------ ------ ------
blib/lib/HTTP/Size.pm          95.5   70.0   69.2  100.0  100.0  100.0   88.4
Total                          95.5   70.0   69.2  100.0  100.0  100.0   88.4
---------------------------- ------ ------ ------ ------ ------ ------ ------

The summary shows me that in my test suite I’ve executed 95.5 percent of the statements in the module. Along with that, I’ve only tested 70 percent of the possible execution paths and 69.2 percent of the possible combinations of conditions. That’s just the summary, though. The HTML report in coverage.html (Figure 5-1) tells me much more.

Figure 5-1. cover creates an HTML report

The HTML report has columns for each of the types of coverage that it measures, and it colors the table cells green to tell me that I have 100 percent coverage for that measurement on that line of code, and red to tell me that I have more testing work to do (Figure 5-2).

Figure 5-2. The coverage report for a particular file shows me how well I tested that line of code

Summary

Before I decide how to improve my Perl program, I need to profile it to determine which sections need the most work. Perl profilers are just specialized debuggers, and if I don’t like what’s already out there, I can make my own profiler.

Mastering Perl by brian d foy