Earlier this week, the Perl magazine site, perl.com, published an article about writing web applications using CGI.pm. That seemed like a bizarre choice to me, but I’ve decided to use it as an excuse to write an article explaining why I think that’s a really bad idea.
It’s important to start by getting some definitions straight – as, often, I see people conflating two or three of these concepts and it always confuses the discussion.
- The Common Gateway Interface (CGI) is a protocol which defines one way that you can write applications that create dynamic web pages. CGI defines the interface between a web server and a computer program which generates a dynamic page.
- A CGI program is a computer program that is written in a manner that conforms to the CGI specification. The program optionally reads input from its environment and then prints to STDOUT a stream of data representing a dynamic web page. Such programs can be (and have been!) written in pretty much any programming language.
- CGI.pm is a CPAN module which makes it easier to write CGI programs in Perl. The module was included in the Perl core distribution from Perl 5.004 (in 1997) until it was removed from Perl 5.22 (in 2015).
A Brief Introduction to CGI.pm
CGI.pm basically contained two sets of functions. One for input and one for output. There was a set for reading data that was passed into the program (the most commonly used one of these was param()) and a set for producing output to send to the browser. Most of these were functions which created HTML elements like <h1> or <p>. By about 2002, most people seemed to have worked out that these HTML creation functions were a bad idea and had switched to using a templating engine instead. One output function that remained useful was header() which gave the programmer an easy way to create the various headers required in an HTTP response – most commonly the “Content-type” header.
For at least the last ten years that I was using CGI.pm, my programs included the line:
1 |
use CGI qw(param header); |
as it was only the param() and header() functions that I used.
I should also point out that there are two different “modes” that you can use the module in. There’s an object-oriented mode (create an object with CGI->new and interact with it through methods) and a function-based mode (just call functions that are exported by the module). As I never needed more than one CGI object in a program, I always just used the function-based interface.
Why Shouldn’t We Use CGI.pm Today?
If you’re using CGI.pm in the way I mentioned above (using it as a wrapper around the CGI protocol and ignoring the HTML generation functions), then it’s not actually a terrible way to write simple web applications. There are two problems with it:
- CGI programs are slow. They start up a Perl process for each request to the CGI URL. This is, of course, a problem with the CGI protocol itself, not the CGI.pm module. This might not be much of a problem if you have a low-traffic application that you want to put on the web.
- CGI.pm gives you no help building more complicated features in a web application. For example, there’s no built-in support for request routing. If your application needs to control a number of URLs, then you either end up with a separate CGI program for each URL or you shoe-horn them all into the same program and set up some far-too-clever mod_rewrite magic. And everyone reinvents the same wheels.
Basically, there are better ways to write web applications in Perl these days. It was removed from the Perl code distribution in 2015 because people didn’t want to encourage people to use an outdated technology.
What are these better methods? Well, anything based on an improved gateway interface specification called the Perl Server Gateway Interface (PSGI). That could be a web framework like Dancer2, Catalyst or Web::Simple or you could even just use raw PSGI (by using the toolkit in the Plack distribution).
Often when I suggest this to people, they think that the PSGI approach is going to be far more complex than just whipping up a quick CGI program. And it’s easy to see why they might think that. All too often, an introduction to PSGI starts by building a relatively powerful (and, therefore, complicated) web application using Catalyst. And while Catalyst is a fine web framework, it’s not the simplest way to write a basic web application.
But it doesn’t need to be that way. You can write PSGI programs in “raw PGSI” without reaching for a framework. Sure, you’ll still have the problems listed in my point two above, but when you want to address that, you can start looking at the various web frameworks. Even so, you’ll have three big benefits from moving to PSGI.
The Benefits of PSGI
As I see it, there are three huge benefits that you get from PSGI.
Software Ecosystem
The standard PSGI toolkit is called Plack. You’ll need to install that. That will give you adapters enabling you to use PSGI programs in pretty much any web deployment environment. It also includes a large number of plugins and extensions (often called “middleware”) for PSGI. All of this software can be added to your application really simply. And any bits of your program that you don’t have to write is always a big advantage.
Testing and Debugging
How do you test your CGI program? Probably, you use something like Selenium (or, perhaps, just LWP) to fire requests at the server and see what results you get back.
And how about debugging any problems that your testing finds? All too often, the debugging that I see is warn() statements written to the web server error log. Actually, when answering questions on StackOverflow, often the poster has no idea where to find the error log and we need to resort to something like use CGI::Carp 'fatalsToBrowser', which isn’t exactly elegant.
A PSGI application is just a subroutine. So it’s trivial for testing tools to call the subroutine with the correct parameters. This makes testing PSGI programs really easy (and all of the tools to do this are part of the Plack distribution I mentioned above). Similarly, there are tools debugging a PSGI program far easier than the equivalent CGI program.
Deployment Flexibility
This, to me, is the big one. I talked earlier about the performance problems that the CGI environment leads to. You have a CGI program that is only used by a few people on your internal network. And that’s fine. The second or so it takes to respond to each request isn’t a problem. But it proves useful and before you know it, many more people start to use it. And then someone suggests publishing it to external users too. The one-second responses stretch to five or ten seconds, or even longer and you start getting complaints about the system. You know you should move it to a persistent environment like FastCGI or mod_perl, but that would require large-scale changes to the code and how are you ever going to find the time for that?
With a PSGI application, things are different. You can start by deploying your PSGI code in a CGI environment if you like (although, to be honest, it seems that very few people do that). Then when you need to make it faster, you can move it to FastCGI or mod_perl. Or you can run it as a standalone web service and configure your web proxy to redirect requests to it. Usually, you’ll be able to use exactly the same code in all of these environments.
In Conclusion
I know why people still write CGI programs. And I know why people still write them using CGI.pm – it’s what people know. It’s seen as the easy option. It’s what twenty-five years of web tutorials are telling them to do.
But in 2018 (and, to be honest, for most of the last ten years) that’s simply not the best approach to take. There are more powerful and more flexible options available.
Please don’t write code using CGI.pm. And please don’t write tutorials encouraging people to do that.
There still is a lot of legacy CGI out there: I run into it often on internal networks. I didn’t see Dave as extolling the benefits of superannuated technology.
You’re right, of course. There’s an awful lot of legacy code out there. My intention in writing this article was to prevent that pile getting too much larger.
Hi Dave, I think you should remove that image. In the first place, as your own article and comments point out, it’s not a open and shut case and a giant red X symbolizing “do not enter” is off the mark. Secondly, while removal of CGI from core and placing warning messages in the distro and throughout the perlosphere may have been a necessary part of steering new developers to modern tools, we should all be careful not to disparage the legacy of something that was so useful for so long, and again, a giant red X is more on the disparaging side than the respecting side. And finally, your screenshot includes a photo of the current maintainer of the module and it’s wildly inappropriate to use his likeness in a picture doctored with a giant red X, which has now been distributed all over the internet (I came here because of seeing it elsewhere).
I think your article is spot on as far as the advice it imparts. And I certainly understand and share your frustration that new Perl devs are so often led into painting themselves into a corner with CGI. But all we can do is *patiently* continue to point people to the better alternatives and the warnings in the module, etc. I bet you could come up with an image that showed the path forward in a positive way rather than one that looks like graffiti sprayed on public property!
I use cgi and Perl on some websites because they are much faster and cleaner. All these new fancy websites take way to long to load when in areas with bad internet connection.
And nothing I’ve written should be taken as implying that you shouldn’t go on using CGI programs if you want to.
However… I think you’re in danger of confusing a few things.
The speed at which your web site loads is determined by two things – the time the web server takes to compose the response and the sheer amount of data that the server sends back.
As a CGI program starts up the Perl compiler each time it responds to a request, that will be the slowest way for a web server to build a response. It’s obviously going to be quicker using mod_perl, FastCGI or a standalone server process, as in those cases the process that is running your Perl code is persistent and doesn’t have to be restarted for each request.
The other factor (the size of the data) is completely in the hands of the programmer. There’s no reason why a Catalyst app should return more data than the same code written using CGI.pm.
So, all other things being equal, I expect that the CGI solution will be the slowest option, not the fastest as you suggest.