Taming QMail

I run my own email server. It uses QMail. I realise there are at least two problems there – all of the cool kids have been using Gmail for their email since approximately forever, and who the hell uses QMail anyway?

Like most of these situations, it’s that way for historical reasons. And, of course, I’d love to change the current situation but it works well enough most of the time that the occasional pain it gives me isn’t worth the pain of changing the set-up.

So there I am. One of about three people in the world who are still running an email server using QMail. And that occasionally gives me problems.

One problem is the management of the mailing queues. The tools that QMail gives you for checking the contents of the queues and deleting any unnecessary messages are rather primitive. It seems that everyone uses third-party tools. A couple of years ago, I had a WordPress installation that was compromised and was used to send thousands of email messages to random people. The outgoing queue was full of spam and the incoming queue was full of bounce messages. I stopped QMail and started looking for a way to cleanly delete all the bad messages while retaining the tiny number of legitimate ones.

I discovered a program called qmHandle which did exactly what I wanted. It enabled me to remove messages from both queues that matched various criteria and before very long at all, I was back in business (having also cleaned up the WordPress installation and tightening its security).

The qmHandle program was written in Perl. And I always had it in the back of my mind that at some point I’d revisit the program and give something back by fixing it or improving it in some way. A few months ago I found time to do that.

I started by looking at the code. And realised that it had pretty much been written to be as hard to maintain as possible. Ok, that’s probably not true, but it certainly wasn’t written to be particularly easy to understand. What the original author, Michele Beltrame, created was really useful, but it seemed to me that he had made it far harder for himself than he needed to.

So I had found my project. I wanted to hack on qmHandle. But before  I could do that, I needed to rewrite it so that it was easier to work on. That became something that I’d hack on in quite moments over a few weeks. The new version is on Github. I started by importing the original version, so it’s interesting to read the commit history to trace the changes that I made. I think there were three main areas where I improved things.

  1. Splitting most of the logic out into a module. I say “most”, but it’s actually all. The command-line program is now a pleasingly simple:
  2. Improving (by which I mainly mean simplifying) the logic and the syntax. I moved a few variable declarations around (so their scope was smaller) and renamed some so their meaning was more obvious. Oh, and I added a couple of useful CPAN modules – Term::ANSIColor and Getopt::Std.
  3. Using Moose. Switching to an OO approach was a big win in general and Moose made this far easier than it would otherwise have been. At some point in the future, I might consider moving from Moose to Moo, for performance reasons.

For a few weeks now, I’ve been using the revised version on my email server and it seems to be working pretty much the same as the original version. So it’s time to set it loose on the wider world. This afternoon, I released it to CPAN. I’ve said above that the number of people using QMail to handle their email is tiny. But if you’re in that group and you want a more powerful way to manage your mail queues, then the new version of qmHandle might well be useful to you.

There are a few things that I know I need to do.

  1. More tests. The main point of moving most of the code into a module was to make it easier to test. Now it’s time to prove that. The current test suite is tiny. I need to improve that.
  2. Configuration. Currently, the configuration is all hard-coded. And different systems might well need different configuration (for example, the queues might be stored in a different directory). There needs to be a simple way to configure that.
  3. Bug fixes and improvements. This was, after all, why I started doing this. I don’t know what those might be, but I’m sure there are ways to improve the program.

I hope at least someone finds this useful.

Code Archaeology

Long-time readers will have seen some older posts where I criticised Perl code that I’ve found in various places on the web. I thought it was about time that I admitted to some of the dodgier corners of my programming career.

You may know that one of my hobbies is genealogy. You might also know that there’s a CPAN module for dealing with GEDCOM files and a mailing list for the discussion of the intersection of Perl and genealogy. The list is usually very quiet, but it woke up briefly a few days ago when Ron Savage asked for help reconstructing some old genealogy software of his that had gone missing from his web site. Once he recovered the missing files, I noticed that in the comments he credited a forgotten program of mine for giving him some ideas. This comment included a link to my web site which (embarrassingly) was now a 404. I don’t link to leave broken links on the web, so I swiftly put a holding page in place on my site and went off to find the missing directory.

It turns out that the directory had been used to distribute a number of my early ventures into open source software. The Wayback Machine had many of them but not everything. And then I remembered that I had full back-ups of some earlier versions of my web site squirrelled away somewhere and it only took an hour or so to track them down. So that I don’t mislay them again, I’ve put them all on Github – in an appropriately named repository.

I think that most of this code dates from around 2000-2003. There’s evidence that a lot of it was stored in CVS or Subversion at some time. But the original repositories are long gone.

So, what do we have there? And just how bad is it?

There’s a really old formmail program. And it immediately becomes apparent that when I wrote, not only did I not know as much Perl as I thought, but I was pretty sketchy on the basics of internet security as well. I can’t remember if I ever put it live but I really hope not.

Then there’s the “ms” suite of programs. My freelancing company is called Magnum Solutions and it amused me when I realised that people could potentially assume that this code came from Microsoft. I don’t think anyone ever did. Here, you’ll find the beginnings of what later became the nms project – but the nms versions are far more secure.

There’s the original slavorg bot from the #london.pm IRC channel. The channel still has a similar bot, but the code has (thankfully) been improved a lot since this version.

Then there’s something just called spam. I think I was trying to get some stats on how much spam I was getting.

There are a couple of programs that date from my days wrangling Sybase in the City of London. There’s a replacement for Sybase’s own “isql” command line program. My version is called sqpl. I can’t remember what I didn’t like about isql, or how successful my replacement was. What’s interesting about this program is that there are two versions. One uses DBI to connect to the database, but the other uses Sybase’s own proprietary “CTlib” connection library. Proof, I guess that I was talking to databases back when DBI was too new and shiny to be trusted in production.

The other Sybase-related program is called sybserv. As I recall, Sybase uses a configuration file to define the connection details of the various servers that any given client can connect to. But the format of that file was rather opaque (I seem to remember the IP address being stored as a packed integer in some cases). This program parses this file and presents the data in a far more readable format. I remember using it a lot. I believe it’s the only Perl program I’ve ever written that uses formats.

Then there’s toc. That reads an HTML document, looking for any headers. It then builds a table of contents based on those headers and inserts it into the document. I think it’ll still work.

The final program is webged. This is the one that Ron got inspiration from. It parses a GEDCOM file and turns it into a web site. It works in two modes, you can either pre-generate a whole site (that’s the sane way to use it) or you can use it as a CGI program where it produces each page on the fly as it is requested. I remember that parsing the GEDCOM file was unusably slow, so I implemented an incredibly naive caching mechanism where I stored a Data::Dumper version of the GEDCOM object and just “eval”ed that. I was incredibly proud of myself at the time.

The code in most of these programs is terrible. Or, at least, it’s very much a product of its time. I can forgive the lack of “use warnings” (Perl 5.6 wasn’t widely used back when this code was written) as they all have “-w” instead. But it’s the use of ampersands on most of the subroutine calls that makes me cringe the most.

But please have fun looking at the code and pointing out all of the idiocies. Just don’t put any of the CGI programs on a server that is anywhere near the internet.

And feel free to share any of your early code.

Driving a Business with Perl

I’ve been a freelance programmer for over twenty years. One really important part of the job is getting paid for the work I do. Back in 1995 when I started out there wasn’t all of the accounting software available that you get now and (if I recall correctly) the little that was available was all pretty expensive stuff.

At some point I thought to myself “I don’t need to buy one of these expensive systems, I’ll write something myself”. So I sat down and sketched out a database schema and wrote a few Perl programs to insert data about the work I had done and generate invoices from that data.

I don’t remember much about the early versions. I do remember coming to the conclusion that the easiest way to generate PDFs of the invoices was using LaTex and then wasting a lot of time trying to bend LaTeX to my will. I got something that looked vaguely ok eventually, but it was always incredibly painful if I ever needed to edit it in any way. These days, I use wkhtmltopdf and my life is far easier. I understand HTML and CSS in a way that I will never understand LaTeX.

Why am I telling you this, twenty years after I started using this code? Well, during this last week, I finally decided it was time to put the code on Github. There were two reasons for this. Firstly, I thought that it might be useful for other people. And secondly, I’m ashamed to admit that this is the first time that the code has ever been put under any kind of version control (and, yes, this is an embarrassing case of “do as I say, not as I do“). I have no excuses. The software I used to drive my business was in a few files on a single hard drive. Files that I was hacking away at with gay abandon when I thought they needed changing. I am a terrible role model.

Other than all the obvious reasons, I’m sad that it wasn’t in version control as it would have been interesting to trace the evolution of the software over the last twenty years. For example, the database access started as raw DBI, spent a brief time using Class::DBI and at some point all got moved to DBIx::Class. It’s likely that I wasn’t using the Template Toolkit when I started – but I can’t remember what I was using in its place.

Anyway, the code is there now. I don’t give any guarantees for its quality, but it does the job for me. Let me know if you find any of it interesting or useful (or, even, laughable).

p.s. An interesting side effect of putting it under (public) version control – since I uploaded it to Github I have been constantly tweaking it. The potential embarrassment of having my code available for anyone to see means that I’ve made more improvements to it in the last week that I have in the previous five years. I’m even considering replacing all the command line programs with a Dancer app.

p.p.s. I actually use FreeAgent for all my accounting these days. It’s wonderful and I highly recommend it. But I still use my own system to generate invoices.

How Well Can You Read Documentation?

(I was going to call this post “How well do you understand context?” but I think this title is more accurate).

I just saw someone recommending this code:

Looks sensible enough, doesn’t it? But it isn’t. What’s the hidden inefficiency?

Testing Syntax Highlighting

Right. I think I might have got this cracked now. Here’s some Perl code.

That’s pretty cool, isn’t it. I wonder what it’ll look like in the web feed.

I’ll try to feed my fixes back to the author of the plugin.