Many people are discovering that the scripting language Perl is the most useful language for getting many computing tasks done. Many of them fail to discover the vast amount of documentation that comes with the language. In the third of his articles about “probably the best set of free documentation for any software currently available”, Dave Cross explains how to write documentation for your own Perl code.
This article first appeared in the July 1999 issue of the online Perl magazine PerlMonth.
Over the last couple of months we’ve been looking at the documentation that comes with every distribution of Perl. This month I want to move on a bit and look at writing documentation for Perl code that you write.
As I’ve mentioned in passing in previous columns, Perl documentation is usually written using Plain Old Documentation (or POD). POD is a simple markup language that aims to make it a simple as possible for programmers to add documentation to their programs and modules. As you’d expect, the standard Perl documentation set contains a file that discusses POD in some detail. If you want to jump straight into that, try typing
at your command line. (Windows users should look for perlpod in the list of Core Perl Docs in the ActivePerl documentation index page.) There is also a section on how POD interacts with standard Perl syntax in the perlsyn POD file.
Why Should I Use POD?
This question can be interpreted in two different ways. If it means ‘Why should I document my code?’ then an answer is beyond the scope of this column. Please go and read a standard text on software engineering! If, on the other hand, it means ‘Why should I use POD instead of <insert name of favourite word processor here>?’ then there are a number of good reasons.
Firstly, as I said above, POD tries to make it as easy as possible for programmers to document their code. One of the ways that it does this is to allow you to embed your documentation within your code. You can intermix Perl code and POD in the same file. You can do this because the Perl interpreter understands POD and therefore knows which bits of the file to ignore. We’ll discuss POD syntax in detail soon, but basically, if the Perl interpreter sees a line starting with an equals sign (‘=’) it will ignore that line and all following lines until it finds one that begins with ‘=cut’, at which point it will start processing the code again.
The second good reason for using POD rather than any other method is the number of tools that exist for processing POD and turning it into all sorts of other formats. As a standard part of the Perl distribution, you get
pod2man which turn your POD into plain text, HTML and Unix man pages respectively. At the other end of the spectrum you can get filters that will turn your POD into industrial strength XML DTDs like DocBook. In fact some of the most respected Perl books were written in POD. The Perl Cookbook is one example. This column is also written in POD and then passed through pod2html before being sent to the editor.
What does POD look like?
POD was intended to be as simple as possible to use. You can learn the entire syntax in about fifteen minutes. There are three kinds of paragraph in a POD file. Paragraphs must be separated by a completely blank line.
- Text Paragraphs
These will make up the majority of most POD files. This is a plain text paragraph that a POD formatter is allowed to reformat to its own specifications – possibly changing line lengths or justification.
- Verbatim Paragraphs
These are text paragraphs that need to be formatted precisely to the author’s specifications. A POD formatter is not allowed to change any of the formatting. A verbatim paragraph is created by putting at least one space at the start of each line.
- Command Paragraphs
A command paragraph instructs a POD foramtter to handle subsequent text in a certain specialised way. Each POD command starts with an equals sign which is followed immediately by a command word and optionally some text. Here is a complete list of POD commands.
Indicates the start of a sequence of POD in a file that goes though to the next =cut line. Formatters will start to process text at this point. The Perl interpreter will ignore text until the =cut line. Most formatters will also take any other recognised POD command as the start of a POD sequence.
Indicates the end of a sequence of POD in a file. Formatters will ignore text until the next POD command. The Perl interpreter will start to process text again.
- =head1 header text
=head2 header text
Introduces a first or second level header. The text following the tag is formatted appropriately for the requested kind of header.
- =over n
Introduces a list where n is the depth of the indent from the surrounding text. List items are denoted with the =item tag. The list is closed with a =back tag.
- =item text
Introduces a list item where text is used to denote the type of list item. Often an asterisk is used for an unordered list and a number is used for an ordered list. An =item tag hould only appear between a pair of =over/=back tags.
Closes the innermost list and moves the margin back the number of characters given on the =over tag that started the list.
- =for format
These tags can be used to set aside a piece of text as only being used by one particular formatter. For example if you have some markup that should only be seen if the file is being processed by an HTML formatter, then you can put either
=for html <!-- some sort of clever HTML here -->
=begin html <!-- some sort of clever HTML here --> =end html
in your file. The difference between the =for and =begin/=end is that =for will only apply to the next paragraph, whereas =begin will effect all paragraphs until the =end is found.
And that’s all the command paragraphs there are. The only other complication to get to grips with are the list of interior sequences that can be used within any paragraph. Here is the complete list of these sequences.
- I<text> italic text
- B<text> bold text
- S<text> text contains non-breaking spaces
- C<code> literal code
- L<name> a link (cross reference) to name
- L<name> man page
- L<name/ident> item in man page
- L<name/”sec”> section in another man page
- L<“sec”> section in this man page (the quotes are optional)
- L</”sec”> as above
- F<file> filename
- X<index> index entry
- Z<> zero-width character
- E<escape> named character
- E<lt> literal <
- E<gt> literal >
- E<n> character number n (probably an ASCII number)
- E<html> a non-numeric HTML entity, e.g. E<copy>
And that’s really all you need to know about the syntax of POD. Simple isn’t it?
Automated POD with h2xs
One of the utilities that comes with with Perl is something called h2xs. It was originally devised as a way to simplify the creation of Perl modules that are interfaces to libraries written in C (it takes C header files and creates XS files which form the glue layer between the Perl module and the C library – hence the name h2xs) but it has become a very useful to create any kind of Perl module as one of the things it does is generate a stub module file. This is easier to explain with an example. Suppose you wanted to write a module called MyModule.pm (possibly not the best name if you wanted to share it with anyone else but, hey, it’s only an example). Change to a directory where you want to work on the new module and at the command line type
h2xs -X -n MyModule
I’ll quickly explain those command line options. -X prevents the command from generating any of the clever XS stuff, effectively saying ‘This will be a plain Perl module’, and -n allows you to give a name to your module.
This command will create a new subdirectory called MyModule and in it you will find a number of files. I don’t have room here to discuss what all of these files are, but probably the most important one is called MyModule.pm. If you open up this file in your favourite editor, you’ll see that it is a fully functional (albeit not very useful) module. For the purposes of this column the most important section is at the end where you’ll see some POD.
There are examples of all three types of POD paragraph in this short section. You will see a number of command paragraphs. These all start with ‘=’. most of them are =head1 tags, but the final command is a =cut. Most of the paragraphs are plain text paragraphs that will be reformatted by whichever formatter you use to process the file, but the one after the SYNOPSIS line is a verbatim paragraph as the lines all start with a space.
There are many advantages to using h2xs to create your modules, not least because automatically does as much as it can to create the documentation. The makefile that is created includes commands to run pod2man on your module to create Unix man pages. This means that when your module is complete and you install it (or other people install it after downloading it from CPAN) you can get the documentation at any time by typing
at your command line and you module will look every bit as professional as any other module you have installed. The downside is, of course, that the stub documentation that is put in the module file is pretty uncomplimentary about authors that don’t change the text, so you’d better make sure that you do edit it.
Why not use the MyModule.pm file to experiment with POD and the standard POD formatters. Change the text a bit and try formatting the file using pod2html or pod2text. Remember, if you need any more help on using these utilities, help is always available from perldoc.
perldoc pod2html perldoc pod2text
are the commands you’ll need.
Documenting your code is obviously a good thing. you won’t necessarily be the next person to edit your module and you should leave as many clues as possible for the people who will be using or editing your code. With POD we are lucky enough to have a system that integrates your documentation very closely with your code and makes it as easy as possible to extract that documentation in a number of formats.