Category Archives: Programming

Unicode and Perl

Over the last couple of days I’ve been involved in a couple of discussions where it is clear that other people don’t understand how Perl deals with Unicode. The documentation is clear and detailed (there’s even a good tutorial) but for some reason people still persist in misunderstanding it.

Here’s a quick quiz. Can you explain (in detail) what is going on with all of these four command-line programs? And for bonus points, which one should we be emulating in our code?

In all cases, assume that my locale is set to en_US.UTF-8.

I’ll post explanations in a few days time.

Update: Coincidentally, Miyagawa posted something very similar on his blog.

Just Build Something

The Political Web

About a month ago, JT Smith suggested that we should all stop talking about Perl and just build something. And, purely coincidentally, over the last few weeks I resurrected a project that I have been poking at for about five years and have finally turned it into something that I’m happy to show the world.

The Political Web is a site which aggregates all of the information I can find on the web about individual British MPs. I say “all of the information”, but that’s obviously a bit of a work in progress. But I think that what I already have is useful and interesting – well, for people who are interested in British politics. I have plans to bring in more information in the future.

Although I’ve been working on the site for five years, I pretty much rebuilt it from scratch when I recently returned to it. Actually getting something useful up and running took about four hours. That’s because I was building it using Perl and, more specifically, Dancer.

Pebble and Perl

I’ve been wearing a Pebble watch for a couple of months now. I really like it but, to be honest, it’s the potential that has me most excited. The number of apps currently available is a bit disappointing and the API is taking its time appearing.

But even when the API is published, I wonder if I’ll have the time to learn all the necessary technologies in order to write Pebble-aware apps. All I want is to have some way to send a notification that pops up on the phone. Surely there must be an easy way to achieve that. Preferably one that I can use from Perl.

Of course, it turns out that there is. The secret is an app called Pushover. Pushover is a web service that sends notifications to your Android (or iOS if you’re that way inclined) device. There’s an app that you install on your device (it’s not free – I think it cost me £3.30) and you need to sign up for a free account. Then you can send notifications to your device either from their web site or using their API. The API is a simple HTTP request-based system. There’s an example on the Pushover web site that uses LWP::UserAgent, but you can make it even simpler using WebService::Pushover.

That’s all well and good. We can now send arbitrary notification from to our phone. How do we get from there to the watch?

The standard Pebble Android app is currently a little disappointing. In particular, it only supports pushing notifications from a tiny number of apps from the phone to the watch. But there’s another alternative. There’s an app called Pebble Notifier which will forward notifications from any app on the phone to the watch. When you install Pebble Notifier you can choose which apps you want to forward notifications for.

So, in summary, sign up for a Pushover account and install Pushover and Pebble Notifier on your phone. Then install WebService::Pushover on your computer. Then you can write code like this:

And that will send a message to your Pebble.

Now all I need is something useful to do with it.

Update: I’ve just noticed that there’s also an IFTTT channel for Pushover. It has nothing to do with Perl, obviously, but would be an easy way to trigger Pebble notifications for certain triggers.

What New(ish) Perl features Do You Use?

Over on LinkedIn, someone asked me “What core PERL[sic] features do you use regularly that are new since 95?” It’s hard to be sure as the perldelta files only seem to go back to 1997 (for example, when were qw(...), q(...) and qq(...) added?), but here’s a quick list off the top of my head.

  • my was, of course, added in 5.0. But 5.004 added the ability to use it in control expressions – while (my $foo = <>) – and in foreach loops – foreach my $foo (@foos)
  • use VERSION
  • Regex extensions – (?<=RE) and similar. Oh, and qr/.../
  • Data::Dumper (added in 5.005)
  • Unicode support – first added in 5.6.0 and improved in every release since
  • our
  • Three-argument open
  • Omission of intermediate arrows in data structure lookups – $foo[$x][$y] instead of $foo[$x]->[$y]
  • use warnings
  • Memoize
  • Test::More and Test::Simple
  • say
  • defined-or
  • use base (or, more recently, use parent)
  • yada-yada operator

Have I missed anything obvious? What new Perl features do you use most?

Texinfo 5.0 in Perl

There was a story on Slashdot on Sunday saying that the new version of Texinfo had replaced the old, C, implementation of makeinfo with one written in Perl.

I thought it would be interesting to look at the Perl they’ve written. This is, after all, a reasonably large example of Perl code that will be getting a bit of attention in the open source[1] world. If you want to look too, you can download the tarball or browse the CVS repository. In both cases, the Perl code is in the directory called ‘tp’.

This first thing to note is that this is obviously code which has been written by programmers who know their craft. This has not been written by script kiddies. There are, however, some rather bizarre touches which imply that the authors don’t know Perl as well as they might hope.

  • The code is nicely partitioned into modules. And many of the modules are really classes. But some of the modules (the ones in the init directory) have no package statement. So they are more like Perl4-style libraries than what we’d recognise as modules.
  • Many (in fact it might be all) of the subroutines in the modules have prototypes. In many cases, that doesn’t do any harm (although Perl prototypes don’t really address the issues that most people assume they address). But many of these subroutines are called as methods. And prototypes have no effect on method calls at all.
  • There is rather more use of package variables (instead of lexical variables) than I’d be comfortable using in my code.
  • I can see no use of CPAN modules. Perhaps there are no CPAN modules that help with this code. But I’d find that surprising in a project of this size.

Then I started to wonder which version of Perl they were targetting. So I searched for “use 5.xxx” statements. And found quite a range. Many of the files insist on 5.006 and most of the rest want 5.00405 – which I consider a scarily old version of Perl to try and support. There was one file that wanted 5.007_003. Then in one file (Texinfo::Parser) I found this:

Reading the original Slashdot story, it claimed that one of the improvements in this new version was its Unicode support. And suggesting that you need Perl 5.006 for decent Unicode shows some major misunderstanding. Perl 5.006 was probably the point at which the Perl 5 Porters started to take Unicode seriously. It took until 5.12 or 5.14 before they got it right. Trying to support Unicode properly on anything earlier is almost certainly doomed to failure.

It’s great that another heavily-used project has started to use Perl. But it’s a shame that people might mistake this 5.6-era code for state of the art Perl. I can’t help wondering if this is a symptom of the “Perl 5 can’t have a new version number” problem that I’ve been reading about recently

I’ll get in touch with the Texinfo team and make some suggestions to them, of course.

Update: I emailed the Texinfo mailing list with a link to this blog post. I got a reply from Patrick Dumas who wrote most of this Perl code. His reply is available on the mailing list archives.

[1] Sorry, it’s a GNU project so obviously I mean “free”, not “open source”.

Give Me MetaCPAN

Ever since MetaCPAN launched I’ve been getting increasingly irritated with people who still use links to search.cpan.org. Isn’t it obvious that MetaCPAN is better? Why do people still insist on sharing links to the older site?

Of course they do it for various reasons. Perhaps they aren’t as in touch with the modern Perl world as I am. Perhaps they are wary about changing to use the new shiny toys because they know that a newer shinier one will be along soon. Perhaps I’m reading a web page from five years ago and they can be forgiven for not linking to a site that didn’t exist at the time.

Eventually I realised that there was no point in getting annoyed. I had a computer. Surely I could do something that would fix this problem.

My first idea was to write a GreaseMonkey script. Then I realised that would be hard. MetaCPAN and CPAN have slightly different URL schemes. I’d need to recognise a CPAN URL and convert it to the equivalent MetaCPAN URL. Not an impossible task at all, but not something I could knock up in an hour or so. Especially not in Javascript.

So I asked on the #metacpan IRC channel. Surely I couldn’t be the first person to have this problem. And someone there introduced me to mcpan.org. There’s no point in clicking on that link. You’ll just end up on MetaCPAN. Because that’s what mcpan does. It’s a URL rewriting service. You give it a CPAN URL (with the cpan.org changed to mcpan.org) and it redirects you to the equivalent page on MetaCPAN.

That’s the hard bit of the problem. The bit I didn’t want to write. And someone else has already written it. But they seem to have kept it very quiet. This deserves more publicity. I wish I could remember who wrote it. If you know, please leave a comment.

So we’re now most of the way there. Now if I click on a CPAN link, I can just edit the location bar to add an ‘m’ and I’ll be redirected to the right place. But we can do better than that. I’d like to be automatically redirected. That’s when I discovered the Redirector extension for Firefox. Once it’s installed you can configure it to redirect certain URLs to other ones. I have it configured so that http://search.cpan.org* redirects to http://mcpan.org$1. See how you can use wildcards to match URLs and then use whatever they matched in the replacement URL. It’s a lot like regexes in Perl.

And we’re done. Now whenever I click on an old-style CPAN link, I’m automatically redirected to mcpan, And that, in turn, redirects me to MetaCPAN. And, best of all, I didn’t have to write any code. It was just a case of putting together tools that already existed.

I did all this a few months ago. I meant to write this blog post at the time, but I forgot. I was reminded this morning when Chisel mentioned a GreaseMonkey script he had written to do all of this. See, Chisel isn’t afraid of parsing URLs in Javascript like I was. He just went ahead and did it.

But having alternative solutions to the problem is good, right?

Update: I’ve just been talking about this on the #metacpan IRC channel and it seems that I was rather misunderstanding what was going on here. Here are the details.

Firstly, it was dpetrov who told me about mcpan.org. It’s his domain. I asked him for the code, and he pointed out that there is no code. mcpan.org just redirects everything to a domain called sco.metacpan.org which is where the magic happens. He just got tired of editing search.cpan.org to sco.metacpan.org so he registered another, simpler, domain.

So the actual cleverness happens over on sco.metacpan.org. And that’s really just a list of rewrite rules (the code is on github). I say “just”, but I still wouldn’t want to write them myself.

All this means that mcpan,org is only a convenient tool for when you’re manually editing your location bar. When you’re using the Redirector extension for Firefox you can miss out the middle man and redirect straight to sco.metacpan.org. So I’ve updated my Redirector rule appropriately.

Why Corporates Hate Perl

This is a reprint of an old blog post.

A few years ago I was writing blog posts (semi-)regularly for O’Reilly. This is the one that probably got the most feedback. I’m reprinting it now because a) it’s pretty hard to find on the O’Reilly site and b) it’s relevant to a couple of conversations that I’ve had over the last few days.

Last week I was in Copenhagen for YAPC::Europe. One of the announcements at the conference was the location of next year’s conference which will be in Lisbon. The theme of next year’s conference will be “Corporate Perl”. And that (along with a couple of conversations last night) got me thinking about a talk that I’ll submit to next year’s conference which might well be entitled “Why Corporates Hate Perl”.

It’s not true, of course. There are a still large number of large companies who love Perl. I could probably work through to my retirement enhancing and extending systems that are written in Perl at many of the big banks in the City of London. There are, however, also many companies who are moving away from Perl for a number of reasons. Here’s one of the reasons that will be included in my talk.

I was talking to people from one such company last night. The Powers That Be at this company have announced that Perl is no longer their language of choice for web systems and that over time (probably a lot of time) systems will be rewritten in a combination of Java and PHP. Management have started to refer to Perl-based systems as “legacy” and to generally disparage it. This attitude has seeped through to non-technical business users who have started to worry if developers mention a system that is written in Perl. Business users, of course, don’t want nasty old, broken Perl code. They want the shiny new technologies.

And so, in a matter of months, the technical managers at this company have created a business environment where Perl is seen as the cause of most of the problems with the current systems. It’s an impressive piece of social engineering.

It’s also, of course, completely unfair. I don’t deny at all that this company (like many others) has a large amount of badly written and hard to maintain Perl code. But I maintain that this isn’t directly due to the code being written in Perl. It’s because the Perl code has developed piecemeal over the last ten or so years in an environment where there was no design authority which encouraged developers to think beyond getting their immediate task done. Many of these systems date back to this company’s first steps onto the internet and were made by separate departments who had no interaction with each other. It’s not really a surprise that the systems don’t interact well and a lot of the code is hard to maintain.

There are, on the other hand, a number of newer systems which are also written in Perl which follow current best practices in Perl development and are far easier to to maintain and enhance – as easy, I would contend, as anything written in the new approved languages.

It’s certainly true that this company has a large number of systems that need to be rewritten over the next few years. But throwing away all of the company’s accumulated Perl expertise and moving to new languages seems to be a step too far. Management are blaming Perl for the problems when really they should be blaming the management and design procedures that were in place (or, more likely, weren’t in place) when the code was originally written.

Many organisations are in the same situation, with large amounts of unwieldy Perl code. Ten or twelve years ago everyone was writing web systems in Perl and we were all making mistakes. We all have to deal with those mistakes but we’ve  hopefully, learned from them and can rewrite our systems to take account of everything that we’ve learned in the last ten years.

It’s too late for the company I’ve been talking about in this article. The anti-Perl social engineering has probably insinuated itself too deeply into the culture. It’s unlikely that Perl’s reputation can be rescued.

But if you have similar problems in your own company, then please try to ensure that blame is apportioned correctly and that you don’t use Perl as a scapegoat.

A couple of updates to the post. I did propose the talk to the next YAPC, but the proposal wasn’t accepted. And the company I talk about in the article is still employing a lot of Perl programmers – four years after this post was written.

DBIC vs DBI

Three times in the last few months I’ve had the “DBIC or raw DBI” discussion. People have told me that they don’t use DBIC because raw DBI is better. And each time, the person promoting DBI in the discussion has used an argument that boils down to “DBIC is probably useful for people who don’t know SQL very well”.

I find that argument really puzzling. Not least because I like to think that I know more than a little about SQL. SQL is a skill that that has run through my career for longer than Perl. I’ve been using SQL since I left college in 1988. I only started using Perl in about 1996. And yet, although I still consider myself a bit of an expert in SQL, I use DBIC for pretty much all of the database work I do these days – and have done for about five years.

I use DBIC not because I don’t understand SQL. I use it because it makes my life easier. I use it because it frees up some of the time I used to spend dealing with the minutia of database communication so that I can spend it working on other, more interesting, parts of my project.

When I’m running training courses that introduce DBIC I have slide that is entitled “SQL is Boring”. It’s a joke of course but, like all the best jokes, it gets a laugh because there’s more than a little truth in it. There are, of course, many interesting SQL problems. I’ve spent many an enjoyable (if slightly frustrating at the time) hour trying to coax the right data out of complex query with correlated subqueries, outer joins and aggregated functions. But that’s the exception rather than the rule.

The vast majority of the SQL I write for the applications I work on is incredibly boring. It’s  boring because it’s all so similar. You get the data to present a list of objects to the user. The user selects the object they’re interested in, so you select all the data about that object. You might select some data about related objects. The user changes some of that data, so you update that row in the database. On a good day, you might delete an object from the database. Or insert a new one. Most of the SQL you need is like that. It’s boring.

We have computers to do the boring work for us. So let the computer generate all that boring SQL. Free up your time to work on the gnarly and interesting problems.

But that’s not the only advantage of using DBIC (or some other ORM). Think about the data that you get back from the database. The data you get back from a DBI call is an array. Or perhaps a hash. Or maybe a multi-dimension data structure if you’re using one of DBI’x more complex fetch() methods. But it’s still a dumb variable. From DBIC, I get an object. An intelligent variable. A variable that knows how to react to various messages. A variable that will save any changes back to the database automatically without me having to worry about where it came from and making sure that I’m writing it back to the right place.

I’m not saying for a second that there’s no place for DBI any more. For a start, DBI underlies DBIC so it’s still a vital part of our toolkit. And of course I still use it for quick one-off scripts. But when those scripts are still hanging round being used and maintained three months later (as they always are) I’ll rewrite them to use DBIC.

If you want to write large applications that are going to be easy to maintain and extend, then you should really be using DBIC (or something similar). I don’t care how well you know SQL. DBIC will make your life easier.

I’ve just mentioned a couple of reasons why I think that DBIC makes my life easier. I’m sure I’ve missed important stuff. What do you think? Why do you use DBIC instead of DBI?

Learning from Bad Code

I’ve written before about Linux Format’s habit of sharing badly written Perl code. I thought things were improving, but in the new edition (November 2012, issue 163) they’re back to their old tricks.

This time it’s a tutorial called “Starfield: Learn new languages”. In this tutorial Mike Saunders writes similar starfield simulation code in C, Python and Perl. Mike’s loyalties are made perfectly clear when these three sections are entitled “Low Level”, “High Level” and “Unusual Level” respectively, so I wasn’t expecting much. But here’s what I got.

Let’s be clear here. This code works exactly as Mike intended it to. But if you’re writing sample code for other people to learn from, I think you should be setting the bar a little higher than “works for me”. I think you should should be aiming to show people good quality code that can be easily mainitained. And this code falls well short of that aim.

Let’s look at some of the problems in more detail:

  • No use strict or use warnings. Let’s be generous and assume the editor removed them to save space.
  • Undeclared variables. Of course. if the code contained use strict then the variables would all have had to be declared. Not declaring them means that we’re using package variables rather than lexical variables. In code this simple it doesn’t make a difference. But it’s a bad habit to get into.
  • Indirect object notation. When creating the Curses object the tutorial uses the syntax $screen = new Curses. Again, not a problem in this program, but a really bad habit to be encouraging. In the article’s defence, the documentation for the Curses module only includes this flawed syntax.
  • Split data structures. The author of the article says “instead of having an array of stars containing coordinates and speeds for each one (i.e. an array of arrays), to make things simpler we’ve just set up three arrays.” I read this to mean “I’ve never been able to work out how to make arrays of arrays work in Perl so I’ve taken the easy way out.” This is, of course, a terrible idea. Linked data items should be stored in the same data structure.
  • C-style for loops. The mark of a C programmer who never really got to grips with Perl. The C-style for loop is rarely used in Perl code. The foreach loop almost always leads to more readable code.
  • Magic numbers. The size of the screen and the maximum speed of the stars appear as numbers in the code. Even if you’re not a Perl programmer, surely you would know that it’s good practice to move those into variables or constants.

With all that in mind, here’s my version of the program.

What do you think? Is it more readable? Easier to maintain? Are there any problems or improvements that I’ve missed?

A Cautionary Tale

I can never remember exactly how Time::Piece works. But that’s ok because I have documentation.

$ perldoc Time::Piece
No documentation found for "Time::Piece".

Huh?

$perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-thread-multi
...

$ corelist Time::Piece
Time::Piece was first released with perl v5.9.5

$ perl -MTime::Piece -E'say $Time::Piece::VERSION'
Can't locate Time/Piece.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).
BEGIN failed--compilation aborted.

So Time::Piece has been in the Perl core since 5.9.5. I’m running Perl 5.14.2 but I don’t have Time::Piece installed.

After ten minutes or so of head-scratching it came to me.

$ sudo yum install perl-core
Loaded plugins: langpacks, local, presto, refresh-packagekit
[ stuff snipped ]
---> Package perl-Time-Piece.x86_64 0:1.20.1-212.fc17 will be installed
[ more stuff snipped]

I’m running Fedora. The Fedora packagers have decided that they don’t need to install the whole standard Perl distribution as part of their standard installation. I don’t have a problem with that. I do have a problem with their naming conventions.

The minimal Perl installation that they include by default is in an RPM called “perl”. The full RPM that includes everything that a Perl developer would expect to see is called “perl-core”. Surely it’s obvious that those names are the wrong way round?

Isn’t there some way that the Perl 5 Porters can object to  this renaming of Perl?

I know I should be installing my own Perl with perlbrew. But I generally find that the system Perl works for everything that I need. There’s just this one thing that is guaranteed to trip me up every time I work on a new Fedora installation.

This is a public service blog post. Perhaps someone will come across it and be saved a couple of hours of confusion.