Category Archives: Programming

Programming Like It’s 1999

This article was published yesterday. It shows a way to extract data about a film from IMDB and put it into a local database. Actually, it doesn’t even do that. It produces SQL that you can then run to insert the data.

It’s all rather nasty stuff and indicative of the fact that most people using Perl are still using idioms that would have made us shudder ten years ago.

There were a few things that I didn’t like about the code. The use of curl to grab data from the web site, the indirect object syntax when creating the XML::Simple object and, in particular, the huge amount of repetitive code used to create the SQL statements.

So I did what anyone would do in my situation. I rewrote it. Here’s my version.

#!/usr/bin/perl

use strict; 
use warnings;

use XML::Simple;
use LWP::Simple;

@ARGV or die "Please provide movie title in quotes\n";

my $movie = shift;
$movie =~ s/\s/+/g;

my $movieData = get "http://www.imdbapi.com/?r=XML&t=$movie";
my $data = XMLin( $movieData );

my @fields = qw[released rating director genre writer runtime plot id
                title votes poster year rated actors];

my %film = %{$data->{movie}};

foreach (@fields) {
  $film{$_} =~ s/'/\\'/g;
}

my $tstamp = time();

my $sql = 'INSERT INTO movie_collection (';
$sql .= join ', ', @fields;
$sql .= ') VALUES (';
$sql .= join ', ', map { qq['$film{$_}'] } @fields;
$sql .= ",'" . time . "');\n";

print $sql;

I haven’t actually changed that much. I’ve tidied up a bit. Switched to using LWP::Simple, removed some unnecessary escaping, things like that. I have made two pretty big changes. I’ve got rid of all of the nasty variables containing data about a film. A film is a single object and therefore should be stored in a single variable. And, happily enough, the $data you get back from XMLin contains a hash that does the trick perfectly.

The second change I made was to rejig the way that the SQL is created. By using an array that contains the names of all of the columns in the table, I can generate the SQL programmatically without all of that repetitive code. I’ve even made the SQL a little safer by explicitly listing the columns that we are inserting data into (this has the side effect of no longer needing to insert a NULL into the id column).

Of course, this would just be a first step. The whole idea of generating SQL to run against the database is ugly. You’d really want to use DBIx::Class (or, at the very least, DBI) to insert the data directly into the database. And why mess around with raw XML when you can us something like IMDB::Film to do it?

At that point in my thought process I had an epiphany. You don’t need the database at all. The IMDB data changes all the time. Why take a local copy? Why not just use the web service directly with IMDB::Film (or perhaps WebService::IMDB – I haven’t used either of them so I have no strong opinions on this).

In general, I think that the original code was too complicated. Which made it hard to maintain. My version is better (but I am, of course, biased) but it can be made even better by using more from CPAN.

CPAN is Perl’s killer app. If you’re not using CPAN then you’re not using half the power of Perl.

What do you think? How would you write this program?

Update: A few people have mentioned the fact that I’m directly interpolating random data into my SQL statements – which is generally seen as a bad thing as it opens the door to SQL injection attacks. In my defence, I’d like to make a couple of points.

Firstly, the data I’m using isn’t just any old data. It’s data that is returned from the IMDB API. So it would be hard to use this for a malicious attack on the system (at least until Hollywood makes a film about the life of Bobby Tables).

Secondly, I am cleaning the data before using it. I’m escaping any single quotes in the input data. I think that removes the possibility of attack. I could be wrong though, if that’s the case, please let me know what I’m missing.

But, in general, I agree that this approach is dangerous. This is one of the major advantages of using DBI. By using bound parameters in your SQL statements you can remove possibility of SQL injection attacks.

Update 2: You can, of course, rely on Zefram to point out the issues here. His comment is well worth reading.

Other people (on IRC) raised the potential of other Unicode characters that databases treat as quote characters but that aren’t covered by my substitution.

Update 3: Here’s a local copy of the original code.

Perl Search Engine

Often on sites like StackOverflow you’ll see questions that people could have answered for themselves if they had just searched the right web sites (usually perldoc or CPAN). But instead, they just went straight for Google and ended up with some dodgy, out of date information that just left them confused.

In order to get round that, I’ve created a Google Custom Search Engine which searches known Perl web sites. You can try it out here.

If you want to use this search engine on your site, the code is below.

<div id="cse" style="width: 100%;">Loading</div>
 <script src="//www.google.co.uk/jsapi" type="text/javascript"></script>
 <script type="text/javascript">
 google.load('search', '1', {language : 'en'});
 google.setOnLoadCallback(function() {
 var customSearchControl = new google.search.CustomSearchControl('008350714774536055976:a2zesuxuecs');
 customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
 customSearchControl.draw('cse');
 }, true);
 </script>
 <link rel="stylesheet" href="//www.google.com/cse/style/look/default.css" type="text/css" />

Context

This generated a lot of discussion in a training course that I ran this week so I thought it was worth sharing more widely.

I think you can say that you understand the concept of context in Perl if you know what these four statements will print and (more importantly) can explain why they don’t all produce the same thing.

print reverse 1 .. 5;
print scalar reverse 1 .. 5;
print reverse 12345;
print scalar reverse 12345;

Too Easy or Too Hard

We hear a lot of people complaining that programming in Perl is too difficult, but I think that a lot of these problems stem from people making the opposite assumption – that writing Perl is easier than it actually is. Let me share a couple of examples. I’ve lightly disguised the companies in question – so if you think you recognise a place where we’ve worked together you’re probably wrong.

In the first, I’m working for a global organisation. Most of their bespoke software is written in Java, but (like pretty much every company) there are a few Perl programs running vital parts of the infrastructure. This company have also outsourced a lot of their maintenance work to another company in India. This Indian company has dozens of people who are dedicated to working on my client’s systems. But they are Java programmers and every once in while they need to do some work on a Perl program.

This company has an internal IRC network and there’s a Perl channel which I keep half an eye on whilst getting on with my work (which is maintaining a huge Perl program). A few times a day one of these Java programmers turns up on the Perl channel explaining a problem they have to solve in a Perl program and asking for help. In pretty much every case, the problem boils down to the Java programmer having major misconceptions about how Perl works. Perl programmers on the channel try their hardest to to help, but it’s often a frustrating experience. Usually the correct answer is “go away and read Learning Perl and Intermediate Perl, then you will understand how the code works”. But these programmers don’t have the time to do that. They need this task finished in an hour. Often it’s a task that could be easily achieved in an hour – but only after you’ve done the groundwork of understanding how Perl works.

The problem this company had was that Perl wasn’t seen as being as important as it actually was. People in the Indian company saw it as a “scripting language” that any of their (no doubt highly qualified) Java programmers would be able to use without any preparation or training. That’s clearly not the case.

But, I hear you say, that’s not fair. Perl and Java are really different languages. I’m glad you pointed that out as it’s a nice link to my second example.

In this example, I’m working for a dotcom company London. Like so many dotcom companies they have a Perl codebase that was thrown together five years ago in a couple of months by people who didn’t really know what they were doing. This company has the added problem that they are finding it hard to recruit people with Perl experience. So they start recruiting people with experience of languages like Python and Ruby in the belief that these languages are so similar that the programming skills are completely interchangeable.

And, of course, Perl, Ruby and Python are all a lot closer to each other than they are to languages like Java. But there are differences. Important differences that aren’t just syntactic. I can read Python and Ruby just fine. But if I got a job where I had to write a lot of either of those languages, I wouldn’t assume that I could just pick it up by osmosis, I’d get a book and read about the language. If I was going permanent at the company, I might even suggest that they send me on a training course.

Conversations with these non-Perl programmers often took a predictable course. They’d start complaining that the code was hard to read. This was hard to counter as a lot of the code was horrible. But if you pointed out some of the newer and better code, they wouldn’t see the improvement. They would insist that somehow Ruby or Python code was inherently easier to read than Perl code. I’d try to point out that Perl programmers find Perl code as easy to read as Python (or Ruby) programmers find Python (or Ruby).

So there seems to be this perception that Perl should be as easy to read as Random Programmer’s favourite language. And I don’t understand why that is. Just because I’ve been programming for almost thirty years, I don’t expect to be able to program in any language I happen to look at – well, certainly not well enough to be paid for doing it.

I’m not sure what we can do to counter this misconception. I think it probably stems from the late 90s when everyone was writing Perl. And if everyone is doing something, then it must be really easy.Of course, most people were writing really horrible Perl because Perl isn’t as easy as they thought it was.

Not sure it’s possible to sum this up in a simple marketing slogan. “Perl is Easy (but not as easy as you think)”.

Crufty Old Perl

It’s eighteen months since I wrote “Why Corporates Hate Perl” and it’s worth pointing out that the company I discussed in that article which was dropping Perl in favour of PHP and Java is still employing as many good Perl programmers as it can find.

I talked in that article about some rather unsubtle social engineering that had been going on at that company. Management had started to talk about Perl as a “legacy language” in an attempt to persuade the business that they really don’t want their applications written in Perl. That doesn’t seem to have been as successful as I feared it would be.

But there’s another kind of social engineering that I’ve seen going on. And this is happening at the developer level. I’ve lost count of the number of times I’ve been sitting in meetings with developers, discussing ways to improve the quality of crufty old Perl code when someone throws in the (more than half-serious) suggestion that we should just rewrite the whole thing using Rails or Django.

There seems to be this idea that switching to a new framework in a new language will act as some time of magic bullet and that suddenly all of our problems will be solved. There are so many ways that this perception is flawed, Here are my two favourites.

  1. The current system is old and crufty not because it’s written in Perl, but because it was written by people who didn’t understand the business requirements fully, didn’t know their tools particularly well or were under pressure to deliver on a timescale that didn’t give them time to design the system properly. Often it’s a combination of all three. Given the time to rewrite the system from scratch, of course it will be better. But it will be better because the business is better understood and tools and techniques have been improved – not because it’s no longer written in Perl.
  2. Frameworks in other languages are not easier to use or more powerful than frameworks in Perl. Anything that you can do with Rails or Django you can do just as easily with Catalyst. It’s using a framework that’s the big win here, not the particular framework that you use. Sure, if you’re a Ruby house then using a Ruby framework will probably match your existing developers’ skills more closely but if your current system is written in Perl then, hopefully, you have a team of people with Perl skills and that’s going to be the language you’ll want to look at.

I’m tired of Perl being seen as a second-class citizen in the dynamic languages world. I freely admit that there’s a lot of bad Perl code out there (I’ll even admit to writing some of it) but it’s not bad Perl code because it’s Perl, it’s bad Perl code because it’s bad code.

This is what the Perl marketing initiative is for. We need people to know that Perl is not the same language that they stopped using ten years ago. Modern Perl can compete with any of the features of any other dynamic language.

By al means, try to get the time to rewrite your crufty old systems. But rewrite them in Modern Perl. You’ll enjoy it and you’ll be really productive.

p.s. I should point out that I’m not talking about any specific client here. This is based on conversations that I’ve had at various companies over the last couple of years and also discussions with many developers in many pubs.

A Subway Metaphor

Many years ago I read a science fiction story which has always stayed with me – although I’m buggered if I can remember the title or the author.

It was set in the not too distant future. Another new line was about to be opened on the New York City Subway [In the comments Rozallin points out that it was, in fact, probably Boston and not NYC]. Mathematicians were warning that the line shouldn’t be opened as every new line increased the complexity of the network and they had calculated that opening this particular line would push the network over some limit and would make it theoretically unsolvable. They worried that the safety of passengers couldn’t be guaranteed if they were travelling on an unsolvable graph.

Of course, the mathematicians were ignored. And, of course, trains started going missing soon after the line was opened. And that’s where our hero (a kind of topological Indiana Jones, if I recall correctly) came in.

For some reason, that’s the image that sometimes pops into my head when I’m working on large application that has had random pieces of code added to it by various people over long periods of time. I worry that the system will eventually become so complex that any train (or, more likely, CPU) that is set running on it will vanish into the depths of complexity, never to be seen again.

I had that feeling particularly strongly this afternoon. This system really needs to be brought under control.

Building Web Sites with Perl

Over on my other blog last night I wrote a piece about how building simple web sites has never been easier. I talked about how it’s really simple to use something like WordPress or Drupal to build a web site that will suit the needs of many organisations – charities, schools, organisations like that.

You’ll have noticed that both Drupal and WordPress are written in PHP. If I was going to include another item on the list, it would probably be Joomla – which is also written in PHP. The first Perl-based system on my list would be Movable Type (or perhaps Melody, the community-driven fork of MT).

I use MT to build blogs (this site is built with MT). I also used it to build my company web site. So why isn’t in my top three suggestions? Well for two reasons. Firstly I don’t think that it’s quite as easy to use for non-technical people as the other systems on my list. And secondly, last year I tried to use MT to build something more complex than a single-blog site and it all went horribly wrong. With some help from the people at Six Apart those problems are getting sorted out and hopefully the project will be launched soon, but I’m currently wary of recommending MT to end users wanting to build sites.

Of course MT gets better all the time. The MT5 betas look really nice and I’m really hopeful that Melody will be a great end-user CMS. But currently I’d still recommend Drupal and/or WordPress.

End users don’t care at all what technologies their web sites are built in. As long as the site looks good and works well, why should it matter to them whether the site is written in PHP, Perl or anything else? But from the point of view of language advocacy, I’d like to be able to recommend something that’s written in Perl.

So what can we do? Well, firstly, you can tell me if I’m missing anything. Is there some other Perl-based simple web site builder that has completely passed me by? What systems would you recommend (or use yourselves) if, for example, a local school asked for help building a simple site?

And if there isn’t something that I’ve missed? Should a group of us sign up for the Melody project in order to ensure that it becomes a worthy alternative to Drupal? Is there some other project that we can co-opt to this purpose?

Or do we just not care? Is it ok that we’re in danger of losing the low-end web CMS market to PHP systems?

Perl Twitter Feed

Last August, when I was writing my talk Proud to Use Perl for YAPC::Europe, I wanted to get a feel for what real people were actually saying about Perl. It’s all very well claiming that people say Perl is dead, but I wanted to get some real quotations to use in the talk. I came up with the idea of using Twitter. I set up a Twitter search feed for tweets containing the word “perl” and monitored that for a couple of days. I quickly got all of the quotations that I needed.

But I found the feed fascinating, so I continued to read it. Sometimes the Perl community can be a little insular, so it was interesting to read what other people were saying about Perl. I still read the feed today.

Over the year, the feed definitely feels like it’s getting bigger. I mean, there are more mentions of Perl. I don’t have any concreate figures because I read the feed at random times of the day and sometimes don’t touch it for a couple of days. It’s tempting to think that more talk about Perl is due to things like the Ironman initiative, but we shouldn’t jump to that conclusion. Firstly, more talk about Perl could just mean more people saying that Perl is dead (I don’t think this is the case) but secondly more talk about Perl could just be indicative of more talk on Twitter in general. Certainly the number of users on Twitter is still grwoing quickly, so that could probably explain the growth in Perl talk.

But over the last week or so, I’ve gradually realised that a lot of the increase in tweets mentioning Perl is due to the increase in spam (or, at least, spam-like) tweets mentioning Perl. I see a huge number of posts from accounts like @e_host which do nothing but advertise web hosting companies. I suppose we should take it as a positive sign that they think Perl is a feature worth mentioning in these adverts. There’s also been an increase it tweets that are reposts from hire-a-freelancer sites. For example, this morning I saw dozens of copies of this “Need Perl Expert” post.

I’m seriously considering dropping the Perl Twitter feed from Google Reader. It’s just becoming such a slog to go through it. I estimate that about a third of it it currently interesting – and that signal to noise ratio is only going to fall.

I do think, however, that it would be useful and interesting (and pretty easy) to set up an application which monitors the feed and records the data. If we just counted the number of posts, that would be interesting. We could even consider pushing the text through some kind of analysis to pull broad types of information from it (“is this a positive or negative mention of Perl?”). The sooner we start, the more data we’ll have to play with.

I think I’ll set something up tomorrow.

What is Wrong With this Picture?

I’ve just found a number of subroutines defined this way in the code that I’m working on.

sub do_something () {
  my $parameter = shift;
  ...
}

I discovered the problem because I started getting errors about “too many parameters”. I knew what the problem was (the empty prototype) but it took a couple of minutes of head-scratching before I realised why it had been working before my changes.

Then I realised.

I have a dislike of “unnecessary” &s on subroutine calls. So almost without realising, I had removed them from the calls to these functions.

A few lessons have been learned.

Lesson 1 (for the original author of this code): Prototypes will trip you up. Do not use them.

Lesson 2 (for me): Ampersands aren’t always as pointless as they appear. They may be masking bugs in the code.

CPAN Web Feeds

I’m still thinking about adding stuff to this blog. I’d like to add some web feeds to the sidebar. In particular, I’d like a feed of my CPAN uploads. I don’t expect it to be a particularly busy feed (although having such blatant evidence of my laziness might galvanise me into being a bit more productive) but was surprised to see that such a feed doesn’t seem to exist.

CPAN has a “latest uploads” feed, And there are a couple more documented in the FAQ, but none of those do what I want.

It seems a simple enough idea. I can pull down the latest uploads feeds and filter it on my name. But before I do that I thought it was worth invoking the power of the lazyweb and asking if anyone else has already scratched that itch.