Code Archaeology

Long-time readers will have seen some older posts where I criticised Perl code that I’ve found in various places on the web. I thought it was about time that I admitted to some of the dodgier corners of my programming career.

You may know that one of my hobbies is genealogy. You might also know that there’s a CPAN module for dealing with GEDCOM files and a mailing list for the discussion of the intersection of Perl and genealogy. The list is usually very quiet, but it woke up briefly a few days ago when Ron Savage asked for help reconstructing some old genealogy software of his that had gone missing from his web site. Once he recovered the missing files, I noticed that in the comments he credited a forgotten program of mine for giving him some ideas. This comment included a link to my web site which (embarrassingly) was now a 404. I don’t link to leave broken links on the web, so I swiftly put a holding page in place on my site and went off to find the missing directory.

It turns out that the directory had been used to distribute a number of my early ventures into open source software. The Wayback Machine had many of them but not everything. And then I remembered that I had full back-ups of some earlier versions of my web site squirrelled away somewhere and it only took an hour or so to track them down. So that I don’t mislay them again, I’ve put them all on Github – in an appropriately named repository.

I think that most of this code dates from around 2000-2003. There’s evidence that a lot of it was stored in CVS or Subversion at some time. But the original repositories are long gone.

So, what do we have there? And just how bad is it?

There’s a really old formmail program. And it immediately becomes apparent that when I wrote, not only did I not know as much Perl as I thought, but I was pretty sketchy on the basics of internet security as well. I can’t remember if I ever put it live but I really hope not.

Then there’s the “ms” suite of programs. My freelancing company is called Magnum Solutions and it amused me when I realised that people could potentially assume that this code came from Microsoft. I don’t think anyone ever did. Here, you’ll find the beginnings of what later became the nms project – but the nms versions are far more secure.

There’s the original slavorg bot from the #london.pm IRC channel. The channel still has a similar bot, but the code has (thankfully) been improved a lot since this version.

Then there’s something just called spam. I think I was trying to get some stats on how much spam I was getting.

There are a couple of programs that date from my days wrangling Sybase in the City of London. There’s a replacement for Sybase’s own “isql” command line program. My version is called sqpl. I can’t remember what I didn’t like about isql, or how successful my replacement was. What’s interesting about this program is that there are two versions. One uses DBI to connect to the database, but the other uses Sybase’s own proprietary “CTlib” connection library. Proof, I guess that I was talking to databases back when DBI was too new and shiny to be trusted in production.

The other Sybase-related program is called sybserv. As I recall, Sybase uses a configuration file to define the connection details of the various servers that any given client can connect to. But the format of that file was rather opaque (I seem to remember the IP address being stored as a packed integer in some cases). This program parses this file and presents the data in a far more readable format. I remember using it a lot. I believe it’s the only Perl program I’ve ever written that uses formats.

Then there’s toc. That reads an HTML document, looking for any headers. It then builds a table of contents based on those headers and inserts it into the document. I think it’ll still work.

The final program is webged. This is the one that Ron got inspiration from. It parses a GEDCOM file and turns it into a web site. It works in two modes, you can either pre-generate a whole site (that’s the sane way to use it) or you can use it as a CGI program where it produces each page on the fly as it is requested. I remember that parsing the GEDCOM file was unusably slow, so I implemented an incredibly naive caching mechanism where I stored a Data::Dumper version of the GEDCOM object and just “eval”ed that. I was incredibly proud of myself at the time.

The code in most of these programs is terrible. Or, at least, it’s very much a product of its time. I can forgive the lack of “use warnings” (Perl 5.6 wasn’t widely used back when this code was written) as they all have “-w” instead. But it’s the use of ampersands on most of the subroutine calls that makes me cringe the most.

But please have fun looking at the code and pointing out all of the idiocies. Just don’t put any of the CGI programs on a server that is anywhere near the internet.

And feel free to share any of your early code.

Driving a Business with Perl

I’ve been a freelance programmer for over twenty years. One really important part of the job is getting paid for the work I do. Back in 1995 when I started out there wasn’t all of the accounting software available that you get now and (if I recall correctly) the little that was available was all pretty expensive stuff.

At some point I thought to myself “I don’t need to buy one of these expensive systems, I’ll write something myself”. So I sat down and sketched out a database schema and wrote a few Perl programs to insert data about the work I had done and generate invoices from that data.

I don’t remember much about the early versions. I do remember coming to the conclusion that the easiest way to generate PDFs of the invoices was using LaTex and then wasting a lot of time trying to bend LaTeX to my will. I got something that looked vaguely ok eventually, but it was always incredibly painful if I ever needed to edit it in any way. These days, I use wkhtmltopdf and my life is far easier. I understand HTML and CSS in a way that I will never understand LaTeX.

Why am I telling you this, twenty years after I started using this code? Well, during this last week, I finally decided it was time to put the code on Github. There were two reasons for this. Firstly, I thought that it might be useful for other people. And secondly, I’m ashamed to admit that this is the first time that the code has ever been put under any kind of version control (and, yes, this is an embarrassing case of “do as I say, not as I do“). I have no excuses. The software I used to drive my business was in a few files on a single hard drive. Files that I was hacking away at with gay abandon when I thought they needed changing. I am a terrible role model.

Other than all the obvious reasons, I’m sad that it wasn’t in version control as it would have been interesting to trace the evolution of the software over the last twenty years. For example, the database access started as raw DBI, spent a brief time using Class::DBI and at some point all got moved to DBIx::Class. It’s likely that I wasn’t using the Template Toolkit when I started – but I can’t remember what I was using in its place.

Anyway, the code is there now. I don’t give any guarantees for its quality, but it does the job for me. Let me know if you find any of it interesting or useful (or, even, laughable).

p.s. An interesting side effect of putting it under (public) version control – since I uploaded it to Github I have been constantly tweaking it. The potential embarrassment of having my code available for anyone to see means that I’ve made more improvements to it in the last week that I have in the previous five years. I’m even considering replacing all the command line programs with a Dancer app.

p.p.s. I actually use FreeAgent for all my accounting these days. It’s wonderful and I highly recommend it. But I still use my own system to generate invoices.

How Well Can You Read Documentation?

(I was going to call this post “How well do you understand context?” but I think this title is more accurate).

I just saw someone recommending this code:

Looks sensible enough, doesn’t it? But it isn’t. What’s the hidden inefficiency?

Testing Syntax Highlighting

Right. I think I might have got this cracked now. Here’s some Perl code.

That’s pretty cool, isn’t it. I wonder what it’ll look like in the web feed.

I’ll try to feed my fixes back to the author of the plugin.