Perl has a default documentation format called Plain Old Documentation, or Pod for short. I can use it directly in my programs, and even between segments of code. Other programs can easily pick out the Pod and translate it into more familiar formats, such as HTML, text, or even PDF. I’ll discuss some of the most used features of Pod, how to test your Pod, and how to create your own Pod translator.
Sean Burke, the same person responsible for most of what I’ll cover in this chapter, completely specified the Pod format in the perlpodspec documentation page. This is the gory-details version of the specification and how to parse it, which we’ll do in this chapter. The stuff we showed you in Learning Perl and Intermediate Perl are just the basics covered in the higher-level perlpod documentation page.
Pod directives start at the beginning of a line at any point where Perl
is expecting a new statement. Each directive starts with an equal sign,
=
, at the beginning of a line when
Perl is expecting a new statement (so not in the middle of statements).
When Perl is trying to parse a new statement but sees that =
, it switches to
parsing Pod. Perl continues to parse the Pod until it reaches the
=cut
directive or the end of the file:
#!/usr/bin/perl =head1 First level heading Here's a line of code that won't execute: print "How'd you see this!?\n"; =over 4 =item First item =item Second item =back =cut print "This line executes\n";
Inside the text of the Pod, interior sequences specify
nonstructural markup that should be displayed as particular typefaces or
special characters. Each of these start with a letter, which specifies
the type of sequence and has the content in brackets. For instance, in
Pod I use the <
to specify a
literal <
. If I want italic text
(if the formatter supports that) I use I<>
:
=head1 Alberto Simões helped review I<Mastering Perl>. In HTML, I would write <i>Mastering Perl</i> to get italics. =cut
I have two ways to turn Pod into some other format: a ready-made translator or write my own. I might even do both at once by modifying something that already exists. If I need to add something extra to the basic Pod format, I’ll have to create something to parse it.
Fortunately, Sean Burke has already done most of the work by
creating Pod::Parser
, which, as long as I follow the basic ideas, can parse normal
Pod as well as my personal extensions to it as long as I extend
Pod::Parser
with a subclass.
Perl comes with several Pod translators already. You’ve probably
used one without even knowing it; the perldoc
command is really a tool to extract
the Pod from a document and format it for you. Typically it formats it
for your terminal settings, perhaps using color or other character
features:
$ perldoc Some::Module
That’s not all that perldoc
can do, though. Since it’s formatting its output for the
terminal window, when I redirect the output to a file it doesn’t look
right. The headings, for one thing, come out weird:
$ perldoc CGI > cgi.txt $ more cgi.txt CGI(3) User Contributed Perl Documentation CGI(3) NNAAMMEE CGI - Simple Common Gateway Interface Class
Using the -t
switch, I can tell
perldoc
to output plaintext instead
of formatting it for the screen:
% perldoc -t CGI > cgi.txt % more cgi.txt NAME CGI - Simple Common Gateway Interface Class
Stepping back even further, perldoc
can decide not to format anything. The
-m
switch simply outputs the source file (which can be handy if I
want to see the source but don’t want to find the file myself). perldoc
searches through @INC
looking for it. perldoc
can do all of this because it’s really
just an interface to other Pod translators. The perldoc
program is really simple because it’s
just a wrapper around Pod::Perldoc
, which I can see by using
perldoc
to look at its own
source:
$ perldoc -m perldoc #!/usr/bin/perl eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}' if 0; # This "perldoc" file was generated by "perldoc.PL" require 5; BEGIN { $^W = 1 if $ENV{'PERLDOCDEBUG'} } use Pod::Perldoc; exit( Pod::Perldoc->run() );
The Pod::Perldoc
module is just code to parse
the command-line options and dispatch to the right subclass, such as
Pod::Perldoc::ToText
. What else is there? To find the
directory for these translators, I use the -l
switch:
$ perldoc -l Pod::Perldoc::ToText /usr/local/lib/perl5/5.8.4/Pod/Perldoc/ToText.pm $ ls /usr/local/lib/perl5/5.8.4/Pod/Perldoc BaseTo.pm ToChecker.pm ToNroff.pm ToRtf.pm ToTk.pm GetOptsOO.pm ToMan.pm ToPod.pm ToText.pm ToXml.pm
Want all that as a Perl one-liner?
$ perldoc -l Pod::Perldoc::ToText | perl -MFile::Basename=dirname \ -e 'print dirname( <> )' | xargs ls
I could make that a bit shorter on my Unix machines since they have a dirname utility already (but it’s not a Perl program):
$ perldoc -l Pod::Perldoc::ToText | xargs dirname | xargs ls
If you don’t have a dirname utility,
here’s a quick Perl program that does the same thing, and
it looks quite similar to the dirname
program in the Perl Power Tools.[50]It’s something I use often when moving around the Perl
library directories:
#!/usr/bin/perl use File::Basename qw(dirname); print dirname( $ARGV[0] );
Just from that, I can see that I can translate Pod to nroff (that’s the stuff going to my terminal), text, RTF, XML, and a bunch of other formats. In a moment I’ll create another one.
perldoc
doesn’t have switches
to go to all of those formats, but its -o
switch can specify a format. Here I want it
in XML format, so I use -oxml
and add
the -T
switch, which just
tells perldoc
to dump everything to
standard output. I could have also used -d
to send it to a file:
$ perldoc -T -oxml CGI
I don’t have to stick to those formatters, though. I can make my
own. I could use my own formatting module with the -M
switch to pull in
Pod::Perldoc::ToRtf
, for instance:
$ perldoc -MPod::Perldoc::ToRtf CGI
Now I have everything in place to create my own Pod formatter. For this
example, I want a table of contents from the Pod input. I can discard
everything else, but I want the text from the =head
directives, and I want the text to be
indented in outline style. I’ll follow the naming sequence of the
existing translators and name mine
Pod::Perldoc::ToToc
. I’ve even put it on CPAN. I
actually used this module to help me write this book.
The start of my own translator is really simple. I look at one of the other translators and do what they do until I need to do something differently. This turns out to be really easy because most of the hard work happens somewhere else:
package Pod::Perldoc::ToToc; use strict; use base qw(Pod::Perldoc::BaseTo); use subs qw(); use vars qw(); use Pod::TOC; $VERSION = '0.10_01'; sub is_pageable { 1 } sub write_with_binmode { 0 } sub output_extension { 'toc' } sub parse_from_file { my( $self, $file, $output_fh ) = @_; # Pod::Perldoc object my $parser = Pod::TOC->new(); $parser->output_fh( $output_fh ); $parser->parse_file( $file ); }
For my translator I inherit from
Pod::Perldoc::BaseTo
. This handles almost everything
that is important. It connects what I do in parse_from_file
to perldoc
’s user interface. When perldoc
tries to load my module, it checks for
parse_from_file
because it will try
to call it once it finds the file it will parse. If I don’t have that
subroutine, perldoc
will move onto the next formatter
in its list. That -M
switch I used
earlier doesn’t tell perldoc
which formatter to use;
it just adds it to the front of the list of formatters that perldoc
will try to use.
In parse_from_file
, the first
argument is a Pod::Perldoc
object. I don’t use that
for anything. Instead I create a new parser object from my
Pod::TOC
module, which I’ll show in the next section.
That module inherits from Pod::Simple
, and most of
its interface comes directly from Pod::Simple
.
The second argument is the filename I’m parsing, and the third
argument is the filehandle, which should get my output. After I create
the parser, I set the output destination with $parser->output_fh()
. The
Pod::Perldoc::BaseTo
module expects output on that
filehandle and will be looking for it. I shouldn’t simply print to
STDOUT
, which would bypass the
Pod::Perldoc
output mechanism, and cause the module
to complain that I didn’t send it any output. Again, I get the benefit
of all of the inner workings of the Pod::Perldoc
infrastructure. If the user wanted to save the output in a file, that’s
where $output_fh
points. Once I have
that set up, I call $parser->parse_file()
, and all the
magic happens.
I didn’t have to actually parse the Pod in my TOC creator because I
use Pod::Simple
behind the scenes. It gives me a
simple interface that allows me to do things when certain events occur.
All of the other details about breaking apart the Pod and determining
what those pieces represent happen somewhere else, where I don’t have to
deal with them. Here’s the complete source for my Pod::TOC
module to extract the table
of contents from a Pod file:
package Pod::TOC; use strict; use base qw( Pod::Simple ); $VERSION = '0.10_01'; sub _handle_element { my( $self, $element, $args ) = @_; my $caller_sub = ( caller(1) )[3]; return unless $caller_sub =~ s/.*_(start|end)$/${1}_$element/; my $sub = $self->can( $caller_sub ); $sub->( $self, $args ) if $sub; } sub _handle_element_start { my $self = shift; $self->_handle_element( @_ ); } sub _handle_element_end { my $self = shift; $self->_handle_element( @_ ); } sub _handle_text { my $self = shift; return unless $self->get_flag; print { $self->output_fh } "\t" x ( $self->_get_flag - 1 ), $_[1], "\n"; } { # scope to hide lexicals that only these subs need my @Head_levels = 0 .. 4; my %flags = map { ( "head$_", $_ ) } @Head_levels; foreach my $directive ( keys %flags ) { no strict 'refs'; foreach my $prepend ( qw( start end ) ) { my $name = "${prepend}_$directive"; *{$name} = sub { $_[0]->_set_flag( $name ) }; } } sub _is_valid_tag { exists $flags{ $_[1] } } sub _get_tag { $flags{ $_[1] } } } { my $Flag; sub _get_flag { $Flag } sub _set_flag { my( $self, $caller ) = shift; my $on = $caller =~ m/^start_/ ? 1 : 0; my $off = $caller =~ m/^end_/ ? 1 : 0; unless( $on or $off ) { return }; my( $tag ) = $caller =~ m/_(.*)/g; return unless $self->_is_valid_tag( $tag ); $Flag = do { if( $on ) { $self->_get_tag( $tag ) } # set the flag if we're on elsif( $off ) { undef } # clear if we're off }; } }
The Pod::TOC
module inherits from
Pod::Simple
. Most of the action happens when
Pod::Simple
parses the module. I don’t have a
parse_file
subroutine that I need for
Pod::Perldoc::ToToc
because
Pod::Simple
already has it, and I don’t need it to do
anything different.
What I need to change, however, is what
Pod::Simple
will do when it runs into the various
bits of Pod. Allison Randal wrote
Pod::Simple::Subclassing
to show the various ways to
subclass the module, and I’m only going to use the easiest one. When
Pod::Simple
runs into a Pod
element, it calls a subroutine named _handle_element_start
with the name of the
element, and when it finishes processing that element, it calls _handle_element_end
in the same way. When it
encounters text within an element, it calls _handle_text
. Behind the scenes,
Pod::Simple
figures out how to join all the text so I
can handle it as logical units (e.g., a whole paragraph) instead of
layout units (e.g., a single line with possibly more lines to come
later).
My _handle_element_start
and
_handle_element_end
are just wrappers
around _handle_element
. I’ll figure
out which one it is by looking at caller
. In _handle_element
, I take the calling
subroutine stored in $caller_sub
and
pick out either start
or end
. I put that together with the element
name, which is in $element
. I end up
with things such as start_head1
and
end_head3
in $caller_sub
. I need to show a little more code
to see how I handle those subroutines.
When I get the begin or end event, I don’t get the text inside
that element, so I have to remember what I’m processing so _handle_text
knows what to do. Every time
Pod::Simple
runs into text, no
matter if it’s a =headN
directive, a
paragraph in the body, or something in an item list, it calls _handle_text
. For my table of contents, I only
want to output text when it’s from a =head
directive. That’s why I have a bit of
indirection in _handle_text
.
In the foreach
loop, I go
through the different levels of the =head
directive.[51]Inside the outer foreach loop, I want to make two
subroutines for every one of those levels: start_head0
, end_head0
, start_head1
, end_head1
, and so on. I use a symbolic
reference (see Chapter 8) to create the subroutine names
dynamically, and assign an anonymous subroutine to the typeglob for that
name (see Chapter 9).
Each of those subroutines is simply going to set a flag. When a
start_headN
subroutine runs, it turns
on the flag, and when the end_headN
subroutine runs, it turns off the same flag. That all happens in
_set_flag
, which sets $Flag
.
My _handle_text
routine looks
at $flag
to decide what to do. If
it’s a true value, it outputs the text, and if it’s false, it doesn’t.
This is what I can use to turn off output for all of the text that
doesn’t belong to a heading. Additionally, I’ll use $flag
to determine the indentation level of my
table of contents by putting the =head
level in it.
So, in order of execution: when I run into =head1
, Pod::Simple
calls
_handle_element_start
. From that, I
immediately dispatch to _handle_element
, which figures out that it’s
the start, and knows it just encountered a =head1
. From that, _handle_element
figures out it needs to call
start_head1
, which I dynamically
created. start_head1
calls
_set_flag(
'start_head1'
)
, which figures out based on the argument to turn on
$Flag
. Next,
Pod::Simple
runs into a bit of text, so it calls
_handle_text
, which checks _get_flag
and gets a true value. It keeps
going and prints to the output filehandle. After that,
Pod::Simple
is done with =head1
, so it calls _handle_element_end
, which dispatches to
_handle_element
, which then calls
end_head1
. When end_head1
runs, it calls _set_flag
, which turns off $Flag
. This sequence happens every
time
Pod::Simple
encounters =head
directives.
I wrote this book using the Pod format, but one that O’Reilly
Media has extended to meet its publishing needs. For instance, O’Reilly
added an N
directive for
footnotes.[52] Pod::Parser
can still handle those, but
it needs to know what to do when it finds them.
Allison Randal created Pod::PseudoPod
as an
extension of Pod::Simple
. It handles those extra
things O’Reilly added and serves as a much longer example of a subclass.
I subclassed her module to create
Pod::PseudoPod::MyHTML
, which I used to create the
HTML for the Mastering Perl web site. You can get
that source from there, too.[53]
Andy Lester wrote the Apache::Pod
module (based on
Apache::Perldoc
by Rich Bowen) so he could serve the
Perl documentation from his Apache web server and read it with his
favorite browser. I certainly like this more than paging to a terminal,
and I get the benefits of everything the browser gives me, including
display styling, search, and links to the modules or URLs the
documentation references.
Sean Burke’s Pod::Webserver
makes its own web
server to translate Pod for the Web. It uses
Pod::Simple
to do its work and should run anywhere
that Perl will run. If I don’t want to install Apache, I can still have my
documentation server.
Once I’ve written my Pod, I can check it to ensure that I’ve done everything correctly. When other people read my documentation, they shouldn’t get any warnings about formatting, and a Pod error shouldn’t keep them from reading it because the parser gets confused. What good is the documentation if the user can’t even read it?
Pod::Checker
is another sort of Pod translator, although instead of
spitting out the Pod text in another format, it watches the Pod and text
go by. When it finds something suspicious, it emits warnings. Perl
already comes with podchecker
, a
ready-to-use program similar to perl
-c
, but for Pod. The program is really just a program
version of Pod::Checker
, which is just another
subclass of Pod::Parser
:
% podchecker Module.pm
The podchecker
program is good
for manual use, and I guess that somebody might want to use it in a
shell script, but I can also check errors directly through
Pod::Simple
. While parsing the input,
Pod::Simple
keeps track of the errors it encounters.
I can look at these errors later:
*** WARNING: preceding non-item paragraph(s) at line 47 in file test.pod *** WARNING: No argument for =item at line 153 in file test.pod *** WARNING: previous =item has no contents at line 255 in file test.pod *** ERROR: =over on line 23 without closing =back (at head2) at line 255 in file test.pod *** ERROR: empty =head2 at line 283 in file test.pod Module.pm has 2 pod syntax errors.
A long time ago, I wanted to do this automatically for all of my
modules, so I created Test::Pod
. It’s been almost
completely redone by Andy Lester, who now maintains the module. I can
drop a t/pod.t file into my test
directory:
use Test::More; eval "use Test::Pod 1.00"; plan skip_all => "Test::Pod 1.00 required for testing POD" if $@; all_pod_files_ok();
After I’ve checked the format of my documentation, I also want to
ensure that I’ve actually documented everything. The
Pod::Coverage
module finds all of the functions in a
package and tries to match those to the Pod it finds. After skipping any
special function names and excluding the function names that start with
an underscore, Perl convention for indicating private methods, it
complains about anything left undocumented.
The easiest invocation is directly from the command line. For
instance, I use the -M
switch to load
the CGI
module. I also use the -M
switch to load
Pod::Coverage
, but I tack on the =CGI
to tell it which package to check.
Finally, since I don’t really want to run any program, I use
-e
1
to give perl
a dummy program:
% perl -MCGI -MPod::Coverage=CGI -e 1
The output gives the CGI
module a rating, then
lists all of the functions for which it didn’t see any
documentation:
CGI has a Pod::Coverage rating of 0.04 The following are uncovered: add_parameter, all_parameters, binmode, can, cgi_error, compile, element_id, element_tab, end_form, endform, expand_tags, init, initialize_globals, new, param, parse_params, print, put, r, save_request, self_or_CGI, self_or_default, to_filehandle, upload_hook
I can write my own program, which I’ll call podcoverage
, to go through all of the packages
I specify on the command line. The rating comes from the coverage
method, which either returns a number
between 0 or 1, or undef
if it
couldn’t rate the module:
#!/usr/bin/perl use Pod::Coverage; foreach my $package ( @ARGV ) { my $checker = Pod::Coverage->new( package => $package ); my $rating = $checker->coverage; if( $rating == 1 ) { print "$package gets a perfect score!\n\n"; } elsif( defined $rating ) { print "$package gets a rating of ", $checker->coverage, "\n", "Uncovered functions:\n\t", join( "\n\t", sort $checker->uncovered ), "\n\n"; } else { print "$package can't be rated: ", $checker->why_unrated, "\n"; } }
When I use this to test Module::NotThere
and
HTML::Parser
, my program tells me that it can’t rate
the first because it can’t find any Pod, and it finds a couple of
undocumented functions in HTML::Parser
:
$ podcoverage Module::NotThere HTML::Parser Module::NotThere can't be rated: couldn't find pod HTML::Parser gets a rating of 0.925925925925926 Uncovered functions: init netscape_buggy_comment
My podcoverage
program really
isn’t all that useful, though. It might help me find hidden functions in
modules, but I don’t really want to depend on those since they might
disappear in later versions. I can use podcoverage
to check my own modules to ensure
I’ve explained all of my functions, but that would be tedious.
Fortunately, Andy Lester automated the process with Test::Pod::Coverage
, which is based on
Pod::Checker
. By creating a test file that I drop
into the t directory of my module distribution, I
automatically test the Pod coverage each time I run
make
test
. I lift this snippet
right out of the documentation. It first tests for the presence of
Test::Pod::Coverage
before it tries anything, making
the whole thing optional for the user who doesn’t have that module
installed, just like the Test::Pod
module:
use Test::More; eval "use Test::Pod::Coverage 1.00"; plan skip_all => "Test::Pod::Coverage 1.00 required for testing POD coverage" if $@; all_pod_coverage_ok();
I mentioned earlier that I could hide functions from these Pod checks. Perl doesn’t have a way to distinguish between public functions that I should document and other people should use, and private functions that I don’t intend users to see. The Pod coverage tests just see functions.
That’s not the whole story, though. Inside
Pod::Coverage
is the wisdom of which functions it
should ignore. For instance, all of the special Tie::
functions (see Chapter 17) are really private functions. By convention, all
functions starting with an underscore (e.g., _init
) are private functions for internal use
only, so Pod::Checker
ignores them. If I want to
create private functions, I put an underscore in front of their
names.
I can’t always hide functions, though. Consider my earlier
Pod::Perldoc::ToToc
subclass. I had to override the
parse_from_file
method so it would
call my own parser. I don’t really want to document that function
because it does the same thing as the method in the parent class but
with a different formatting module. Most of the time the user doesn’t
call it directly, and it really just does the same thing as
documentation for parse_from_file
in
the Pod::Simple
superclass. I can tell
Pod::Checker
to ignore certain names or names that
match a regular expression:
my $checker = Pod::Coverage->new( package => $package, private => [ qr/^_/ ], also_private => [ qw(init import DESTROY AUTOLOAD) ], trustme => [ qr/^get_/ ], );
The private
key takes a list of
regular expressions. It’s intended for the truly private functions.
also_private
is just a list of
strings for the same thing so I don’t have to write a regular expression
when I already know the names. The trustme
key is a bit different. I use it to
tell Pod::Checker
that even though I apparently
didn’t document those public functions, I’m not going to. In my example,
I used the regular expression qr/^get_/
. Perhaps I documented a series of
functions in a single shot instead of giving them all individual
entries. Those might even be something that AUTOLOAD
creates. The Test::Pod::Coverage
module uses the same
interface to ignore functions.
Pod is the standard Perl documentation format, and I can easily translate it to other formats with the tools that come with Perl. When that’s not enough, I can write my own Pod translator to go to a new format or provide new features for an existing format. When I use Pod to document my software, I also have several tools to check its format and ensure I’ve documented everything.
The perlpod documentation outlines the basic Pod format, and the perlpodspec documentation gets into the gory implementation details.
Allison Randal’s Pod::Simple::Subclassing
demonstrates other ways to subclass Pod::Simple
.
Pod::Webserver
shows up as Hack #3 in
Perl Hacks by chromatic, Damian Conway, and Curtis “Ovid” Poe
(O’Reilly).
I wrote about subclassing Pod::Simple
to output
HTML in “Playing with Pod” for The Perl Journal,
December 2005: http://www.ddj.com/dept/lightlang/184416231.
I wrote about Test::Pod
in “Better Documentation
Through Testing” for The Perl Journal, November
2002.
[50] You can find Perl Power Tools here: http://sourceforge.net/projects/ppt/.
[51] I’m using the values 0 to 4 because PseudoPod, the format
O’Reilly uses and that I used to write this book, adds
=head0
to the Pod format.
[52] You may have noticed that we liked footnotes in Learning Perl and Intermediate Perl.
[53] Mastering Perl web site: http://www.pair.com/comdog/mastering_perl/.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.