Earlier this week, the Perl magazine site, perl.com, published an article about writing web applications using CGI.pm. That seemed like a bizarre choice to me, but I’ve decided to use it as an excuse to write an article explaining why I think that’s a really bad idea.
It’s important to start by getting some definitions straight – as, often, I see people conflating two or three of these concepts and it always confuses the discussion.
- The Common Gateway Interface (CGI) is a protocol which defines one way that you can write applications that create dynamic web pages. CGI defines the interface between a web server and a computer program which generates a dynamic page.
- A CGI program is a computer program that is written in a manner that conforms to the CGI specification. The program optionally reads input from its environment and then prints to STDOUT a stream of data representing a dynamic web page. Such programs can be (and have been!) written in pretty much any programming language.
- CGI.pm is a CPAN module which makes it easier to write CGI programs in Perl. The module was included in the Perl core distribution from Perl 5.004 (in 1997) until it was removed from Perl 5.22 (in 2015).
A Brief Introduction to CGI.pm
CGI.pm basically contained two sets of functions. One for input and one for output. There was a set for reading data that was passed into the program (the most commonly used one of these was param()) and a set for producing output to send to the browser. Most of these were functions which created HTML elements like <h1> or <p>. By about 2002, most people seemed to have worked out that these HTML creation functions were a bad idea and had switched to using a templating engine instead. One output function that remained useful was header() which gave the programmer an easy way to create the various headers required in an HTTP response – most commonly the “Content-type” header.
For at least the last ten years that I was using CGI.pm, my programs included the line:
use CGI qw(param header);
as it was only the param() and header() functions that I used.
I should also point out that there are two different “modes” that you can use the module in. There’s an object-oriented mode (create an object with CGI->new and interact with it through methods) and a function-based mode (just call functions that are exported by the module). As I never needed more than one CGI object in a program, I always just used the function-based interface.
Why Shouldn’t We Use CGI.pm Today?
If you’re using CGI.pm in the way I mentioned above (using it as a wrapper around the CGI protocol and ignoring the HTML generation functions), then it’s not actually a terrible way to write simple web applications. There are two problems with it:
- CGI programs are slow. They start up a Perl process for each request to the CGI URL. This is, of course, a problem with the CGI protocol itself, not the CGI.pm module. This might not be much of a problem if you have a low-traffic application that you want to put on the web.
- CGI.pm gives you no help building more complicated features in a web application. For example, there’s no built-in support for request routing. If your application needs to control a number of URLs, then you either end up with a separate CGI program for each URL or you shoe-horn them all into the same program and set up some far-too-clever mod_rewrite magic. And everyone reinvents the same wheels.
Basically, there are better ways to write web applications in Perl these days. It was removed from the Perl code distribution in 2015 because people didn’t want to encourage people to use an outdated technology.
What are these better methods? Well, anything based on an improved gateway interface specification called the Perl Server Gateway Interface (PSGI). That could be a web framework like Dancer2, Catalyst or Web::Simple or you could even just use raw PSGI (by using the toolkit in the Plack distribution).
Often when I suggest this to people, they think that the PSGI approach is going to be far more complex than just whipping up a quick CGI program. And it’s easy to see why they might think that. All too often, an introduction to PSGI starts by building a relatively powerful (and, therefore, complicated) web application using Catalyst. And while Catalyst is a fine web framework, it’s not the simplest way to write a basic web application.
But it doesn’t need to be that way. You can write PSGI programs in “raw PGSI” without reaching for a framework. Sure, you’ll still have the problems listed in my point two above, but when you want to address that, you can start looking at the various web frameworks. Even so, you’ll have three big benefits from moving to PSGI.
The Benefits of PSGI
As I see it, there are three huge benefits that you get from PSGI.
The standard PSGI toolkit is called Plack. You’ll need to install that. That will give you adapters enabling you to use PSGI programs in pretty much any web deployment environment. It also includes a large number of plugins and extensions (often called “middleware”) for PSGI. All of this software can be added to your application really simply. And any bits of your program that you don’t have to write is always a big advantage.
Testing and Debugging
How do you test your CGI program? Probably, you use something like Selenium (or, perhaps, just LWP) to fire requests at the server and see what results you get back.
And how about debugging any problems that your testing finds? All too often, the debugging that I see is warn() statements written to the web server error log. Actually, when answering questions on StackOverflow, often the poster has no idea where to find the error log and we need to resort to something like use CGI::Carp 'fatalsToBrowser', which isn’t exactly elegant.
A PSGI application is just a subroutine. So it’s trivial for testing tools to call the subroutine with the correct parameters. This makes testing PSGI programs really easy (and all of the tools to do this are part of the Plack distribution I mentioned above). Similarly, there are tools debugging a PSGI program far easier than the equivalent CGI program.
This, to me, is the big one. I talked earlier about the performance problems that the CGI environment leads to. You have a CGI program that is only used by a few people on your internal network. And that’s fine. The second or so it takes to respond to each request isn’t a problem. But it proves useful and before you know it, many more people start to use it. And then someone suggests publishing it to external users too. The one-second responses stretch to five or ten seconds, or even longer and you start getting complaints about the system. You know you should move it to a persistent environment like FastCGI or mod_perl, but that would require large-scale changes to the code and how are you ever going to find the time for that?
With a PSGI application, things are different. You can start by deploying your PSGI code in a CGI environment if you like (although, to be honest, it seems that very few people do that). Then when you need to make it faster, you can move it to FastCGI or mod_perl. Or you can run it as a standalone web service and configure your web proxy to redirect requests to it. Usually, you’ll be able to use exactly the same code in all of these environments.
I know why people still write CGI programs. And I know why people still write them using CGI.pm – it’s what people know. It’s seen as the easy option. It’s what twenty-five years of web tutorials are telling them to do.
But in 2018 (and, to be honest, for most of the last ten years) that’s simply not the best approach to take. There are more powerful and more flexible options available.
Please don’t write code using CGI.pm. And please don’t write tutorials encouraging people to do that.
Also published on Medium.