Code Archaeology

Long-time readers will have seen some older posts where I criticised Perl code that I’ve found in various places on the web. I thought it was about time that I admitted to some of the dodgier corners of my programming career.

You may know that one of my hobbies is genealogy. You might also know that there’s a CPAN module for dealing with GEDCOM files and a mailing list for the discussion of the intersection of Perl and genealogy. The list is usually very quiet, but it woke up briefly a few days ago when Ron Savage asked for help reconstructing some old genealogy software of his that had gone missing from his web site. Once he recovered the missing files, I noticed that in the comments he credited a forgotten program of mine for giving him some ideas. This comment included a link to my web site which (embarrassingly) was now a 404. I don’t link to leave broken links on the web, so I swiftly put a holding page in place on my site and went off to find the missing directory.

It turns out that the directory had been used to distribute a number of my early ventures into open source software. The Wayback Machine had many of them but not everything. And then I remembered that I had full back-ups of some earlier versions of my web site squirrelled away somewhere and it only took an hour or so to track them down. So that I don’t mislay them again, I’ve put them all on Github – in an appropriately named repository.

I think that most of this code dates from around 2000-2003. There’s evidence that a lot of it was stored in CVS or Subversion at some time. But the original repositories are long gone.

So, what do we have there? And just how bad is it?

There’s a really old formmail program. And it immediately becomes apparent that when I wrote, not only did I not know as much Perl as I thought, but I was pretty sketchy on the basics of internet security as well. I can’t remember if I ever put it live but I really hope not.

Then there’s the “ms” suite of programs. My freelancing company is called Magnum Solutions and it amused me when I realised that people could potentially assume that this code came from Microsoft. I don’t think anyone ever did. Here, you’ll find the beginnings of what later became the nms project – but the nms versions are far more secure.

There’s the original slavorg bot from the #london.pm IRC channel. The channel still has a similar bot, but the code has (thankfully) been improved a lot since this version.

Then there’s something just called spam. I think I was trying to get some stats on how much spam I was getting.

There are a couple of programs that date from my days wrangling Sybase in the City of London. There’s a replacement for Sybase’s own “isql” command line program. My version is called sqpl. I can’t remember what I didn’t like about isql, or how successful my replacement was. What’s interesting about this program is that there are two versions. One uses DBI to connect to the database, but the other uses Sybase’s own proprietary “CTlib” connection library. Proof, I guess that I was talking to databases back when DBI was too new and shiny to be trusted in production.

The other Sybase-related program is called sybserv. As I recall, Sybase uses a configuration file to define the connection details of the various servers that any given client can connect to. But the format of that file was rather opaque (I seem to remember the IP address being stored as a packed integer in some cases). This program parses this file and presents the data in a far more readable format. I remember using it a lot. I believe it’s the only Perl program I’ve ever written that uses formats.

Then there’s toc. That reads an HTML document, looking for any headers. It then builds a table of contents based on those headers and inserts it into the document. I think it’ll still work.

The final program is webged. This is the one that Ron got inspiration from. It parses a GEDCOM file and turns it into a web site. It works in two modes, you can either pre-generate a whole site (that’s the sane way to use it) or you can use it as a CGI program where it produces each page on the fly as it is requested. I remember that parsing the GEDCOM file was unusably slow, so I implemented an incredibly naive caching mechanism where I stored a Data::Dumper version of the GEDCOM object and just “eval”ed that. I was incredibly proud of myself at the time.

The code in most of these programs is terrible. Or, at least, it’s very much a product of its time. I can forgive the lack of “use warnings” (Perl 5.6 wasn’t widely used back when this code was written) as they all have “-w” instead. But it’s the use of ampersands on most of the subroutine calls that makes me cringe the most.

But please have fun looking at the code and pointing out all of the idiocies. Just don’t put any of the CGI programs on a server that is anywhere near the internet.

And feel free to share any of your early code.


Also published on Medium.

One thought on “Code Archaeology

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.