Categories
Conferences

Plans for Riga

The European Perl Conference this year is going to be held in Riga in August. That might seem a long way away, but it’s never too early to start thinking about these things. For example the conference web site went live earlier this week, enabling users to register for the conference and buy their tickets.

And people who are planning to speak at the conference or run workshops alongside the conference need to get their act together early so that people who are planning to attend the conference know what is going to be happened. In particular, it’s a good idea to get the workshops organised and announced early so that people booking flights and hotels for the conference know that there are workshops taking place in the days before (and, sometimes, after) the conference and can book travel for the right dates.

So I’ve been thinking about what I want to do in Riga this summer and I think I have a plan.

I’m planning to re-run the “Modern Perl Web Development with Dancer” workshop that I ran in Cluj in 2016. It was easily the most successful YAPC/PerlCon workshops I’ve ever run with a full class of twenty people working through the day to build a simple web application using a number of modern web  development tools. This will be an updated version of the course as things have moved on a bit in the three years since I last ran the workshop.

Nothing is set in stone yet. I’ve submitted a proposal to the organisers and I hope we can get details tied down and tickets on sale as soon as possible. I’ll report on progress as I here what’s going on.

I’ve also come up with a talk that I’ve proposed for the main conference. It’s called “Measuring the Quality of your Perl Code” and it will be a look at was to measure the “quality” of your Perl code – on the basis that only once you start to measure something, can you start to make improvements. Again, it’s currently just a proposal. It hasn’t been accepted (but I’m taking the fact that it’s currently displayed on the front page of the site as a good sign!)

I should come up with a lightning talk too. Currently, I have no idea what that might be.

How about you? Are you planning to come to Riga in August? Will you be giving a talk?

Categories
Programming

A Subtle Bug

Earlier this week, I saw this code being recommended on Stack Overflow. The code contains a nasty, but rather subtle bug. The version I saw has been fixed now, but I thought there were some interesting lessons to learn by looking at the problems in some detail.

Let’s start by working out what the bug is. Here’s the code:

On first glance, it seems fine. It uses the common “open or die” idiom. It uses the modern approach of using a lexical filehandle. It even uses the three-argument version of “open()”. Code like has appeared in huge numbers of Perl programs for years. What can possibly be the problem?

I’ll give you a couple of minutes to have a closer look and work out what you think the problem is.

[ … time passes … ]

So what do you think? Do you see what the problem is?

The problem is that there is no error checking.

“What do you mean, Dave?” I hear you say. “There’s error checking there – I can see it plainly.” Some of you might even be wondering if I’m going senile.

And, yes, it certainly looks like it checks for errors. But the error checking doesn’t work. Let me prove that to you. We can check it with a simple command line program.

You would expect to see the “die” message there. But it doesn’t appear. Ok, perhaps I’m lying. Perhaps I really do have a file called “not.there”. Let’s try another, slightly different, version of the code.

And there we see the error message. That file really doesn’t exist.

So what went wrong with the first version? Of course, a good way to start working that out is to compare the two versions and look at the differences between them. The difference here is that when I put parentheses around the parameters to “open()” it started working. And when you fix things by adding parentheses it’s a pretty sure bet that the problem comes down to precedence.

The order of precedence for Perl operators is listed in perldoc perlop. If you look at that list you’ll see that the “or” operator we used (“||”) is at position 16 on the list. But what other operators are we using in our code? The answer is lurking down at position 21 on the list. When we call a Perl built-in function without using parentheses around the parameters, it’s known as a list operator. And list operators have rather low precedence.

All of which means that our original code is actually parsed as if we had written it like this:

Notice the parentheses that have appeared around $file and (crucially) the whole “or die” clause. That means that the bracketed expression is evaluated and passed to “open()” as its third argument. And when Perl evaluates that expression, it does that clever “Boolean short-circuiting” thing. An expression of “A || B” evaluates A first and if that is true, it returns it. Only if A is false will it go on to evaluate B and return that. In our case, the filename will always be true (well, unless you have a file called “0”) so the second half of the expression (the “or die…” bit) is never evaluated and, effectively, ignored.

Which is why I said, back at the start, that this code has no error checking at all – that’s literally true as the error checking has no effect at all.

So how do we fix it? Well, we’ve already seen one approach – you can explicitly add parentheses around the arguments to “open()”. But Perl programmers don’t like to use unnecessary punctuation and I’m sure I’ve seen this written without parentheses, so how does that work?

If you take another look at the table of operator precedence and look down below the list operators, you’ll see another “or” operator (the one that’s actually the word “or”, rather than punctuation). It’s right at the bottom of the list – at position 24. And that means we can use that version without needing the parentheses around the parameters to “open()”.

And that’s the version that you’ll see in most codebases. But, as we’ve seen, it’s vitally important to use the correct version of the “or” operator.

The worst thing about this bug is that it appears at the worst time. If your file exists and you can open it successfully, then everything works fine. Things only go wrong when… well, when things go wrong. If you can’t open your file for some reason, you won’t know about it. Which is bad.

So it’s important to test that your code works correctly when things go wrong. And that’s why we have modules like Test::Exception. You could write a test program like this:

And it would fail every time. But if you switched to the other “or” operator, it will work.

There’s one other approach you can take. You can use autodie in your code and just forget about adding “or die” to any of your calls to “open()”.

This is an easy bug to introduce into your code and a hard one to track down. Who’s confident that it doesn’t appear in any of their code?

Categories
Conferences

London Perl Workshop 2018

Last Saturday was the London Perl Workshop and (as has become traditional), I’m going to tell you how much fun it was so that you feel jealous that you missed it and make more of an effort to come along next year.

This year was slightly different for me. For various reasons, I didn’t have the time to propose and prepare any talks for the workshop so (for the first time ever) I decided I’d just go to the workshop and not give any talks. It very nearly worked too.

I arrived at about 9am, checked in at the registration desk and collected my free t-shirt. Then I went upstairs to the main lecture theatre to see Julian giving the organisers’ welcome talk. Julian is a new member of the organising committee, having only moved to London in the last year. But he’s certainly thrown himself into the community.

Following the welcome talk, I stayed in the main room to hear Léon Brocard explaining what’s new in the world of HTTP. It seems the HTTP3 is just around the corner and while it’s a lot more complicated than previous versions it has the potential to make the web a lot faster. I stayed in the same room to hear Martin Berends talking about Cucumber. I’ve been meaning to look at Cucumber in more detail for some years – perhaps this talk will be the prod I need.

Things were running a little late in the main room by this point, so I was a little late getting to Simon Proctor‘s 24 uses for Perl6‎. I try to get to at least one Perl 6 talk at every conference I go to. And this time, I was galvanised enough to buy a copy of Learning Perl 6 for my Kindle.

I caught up with a few friends over the coffee break and then headed back to the main room to see Saif Ahmed explaining Quick and Dirty GUI Applications (and why you should make them)‎. This was nostalgic for me. Almost twenty years ago at an OSCON in California, I remember a long, drunken night when some of us sketched out a plan to build a framework-agnostic GUI toolkit for Perl (like a DBI for GUIs). I think we gave up when we realised we would need to call it “PUI”. Anyway, it seems that Saif (who was keen to make it very clear that he’s not a professional programmer) has written such a tool.

After that I went to see my former colleague Andrew Solomon talking about ‎HOWTO: grow the Perl team. The talk was based around his experiences helping various companies training up Perl programmers using his Geek Uni site.

And then it was lunchtime. I met up with a few other former London Perl Mongers leaders and we had some very nice pizzas just over the road from the workshop venue. Over lunch, we claimed to be preparing for our afternoon panel discussion – but really we were mainly just reminiscing.

After lunch, it was back to the main room to see Peter Allan’s talk on Security Checks using perlcritic and Tom Hukins on Contrarian Perl‎. Both talks were the kind of thing that really makes you think. Peter’s talk introduced some interesting ideas about pushing perlcritic further than it is usually pushed. And Tom pointed out that in order to write the best code possible, you might need to move beyond the generally accepted “industry standards”.

After that, there was a brief visit to a different room to hear Mohammed Anwar talking about The power of mentoring‎. Mohammed is another recent newcomer to the Perl community and, like Julien, he is certainly making a difference.

I skipped the coffee break and went back to the main room to prepare for the one session that I had been roped into contributing to – ‎”I’m a Former London.PM Leader – Ask Me Anything”‎. We had gathered together as many of the former London Perl Mongers leaders and we took questions from the audience about the past, present and future of the group. I was slightly worried that it might tip over into nostalgic self-indulgence, but several people later told me that they had really enjoyed it.

Then it was time for the lightning talks (which were as varied and entertaining as always) and Julien’s “thank-you” talk. Like last year, the post-conference started in the venue before moving on to a pub. I stayed for an hour or so, chatting to friends, before making my way home.

As always, I’d like to thank all of the organisers, speakers, sponsors and attendees for making the workshop as successful as it (always!) is.

Here’s a list of those sponsors. They’re nice companies:

Hope to see you at next years workshop.

Categories
Programming

Please Don’t Use CGI.pm

Earlier this week, the Perl magazine site, perl.com, published an article about writing web applications using CGI.pm. That seemed like a bizarre choice to me, but I’ve decided to use it as an excuse to write an article explaining why I think that’s a really bad idea.

It’s important to start by getting some definitions straight – as, often, I see people conflating two or three of these concepts and it always confuses the discussion.

  • The Common Gateway Interface (CGI) is a protocol which defines one way that you can write applications that create dynamic web pages. CGI defines the interface between a web server and a computer program which generates a dynamic page.
  • A CGI program is a computer program that is written in a manner that conforms to the CGI specification. The program optionally reads input from its environment and then prints to STDOUT a stream of data representing a dynamic web page. Such programs can be (and have been!) written in pretty much any programming language.
  • CGI.pm is a CPAN module which makes it easier to write CGI programs in Perl. The module was included in the Perl core distribution from Perl 5.004 (in 1997) until it was removed from Perl 5.22 (in 2015).

A Brief Introduction to CGI.pm

CGI.pm basically contained two sets of functions. One for input and one for output. There was a set for reading data that was passed into the program (the most commonly used one of these was param()) and a set for producing output to send to the browser. Most of these were functions which created HTML elements like <h1> or <p>. By about 2002, most people seemed to have worked out that these HTML creation functions were a bad idea and had switched to using a templating engine instead. One output function that remained useful was header() which gave the programmer an easy way to create the various headers required in an HTTP response – most commonly the “Content-type” header.

For at least the last ten years that I was using CGI.pm, my programs included the line:

as it was only the param() and header() functions that I used.

I should also point out that there are two different “modes” that you can use the module in. There’s an object-oriented mode (create an object with CGI->new and interact with it through methods) and a function-based mode (just call functions that are exported by the module). As I never needed more than one CGI object in a program, I always just used the function-based interface.

Why Shouldn’t We Use CGI.pm Today?

If you’re using CGI.pm in the way I mentioned above (using it as a wrapper around the CGI protocol and ignoring the HTML generation functions), then it’s not actually a terrible way to write simple web applications. There are two problems with it:

  1. CGI programs are slow. They start up a Perl process for each request to the CGI URL. This is, of course, a problem with the CGI protocol itself, not the CGI.pm module. This might not be much of a problem if you have a low-traffic application that you want to put on the web.
  2. CGI.pm gives you no help building more complicated features in a web application. For example, there’s no built-in support for request routing. If your application needs to control a number of URLs, then you either end up with a separate CGI program for each URL or you shoe-horn them all into the same program and set up some far-too-clever mod_rewrite magic. And everyone reinvents the same wheels.

Basically, there are better ways to write web applications in Perl these days. It was removed from the Perl code distribution in 2015 because people didn’t want to encourage people to use an outdated technology.

What are these better methods? Well, anything based on an improved gateway interface specification called the Perl Server Gateway Interface (PSGI). That could be a web framework like Dancer2, Catalyst or Web::Simple or you could even just use raw PSGI (by using the toolkit in the Plack distribution).

Often when I suggest this to people, they think that the PSGI approach is going to be far more complex than just whipping up a quick CGI program. And it’s easy to see why they might think that. All too often, an introduction to PSGI starts by building a relatively powerful (and, therefore, complicated) web application using Catalyst. And while Catalyst is a fine web framework, it’s not the simplest way to write a basic web application.

But it doesn’t need to be that way. You can write PSGI programs in “raw PGSI” without reaching for a framework. Sure, you’ll still have the problems listed in my point two above, but when you want to address that, you can start looking at the various web frameworks. Even so, you’ll have three big benefits from moving to PSGI.

The Benefits of PSGI

As I see it, there are three huge benefits that you get from PSGI.

Software Ecosystem

The standard PSGI toolkit is called Plack. You’ll need to install that. That will give you adapters enabling you to use PSGI programs in pretty much any web deployment environment. It also includes a large number of plugins and extensions (often called “middleware”) for PSGI. All of this software can be added to your application really simply. And any bits of your program that you don’t have to write is always a big advantage.

Testing and Debugging

How do you test your CGI program? Probably, you use something like Selenium (or, perhaps, just LWP) to fire requests at the server and see what results you get back.

And how about debugging any problems that your testing finds? All too often, the debugging that I see is warn() statements written to the web server error log. Actually, when answering questions on StackOverflow, often the poster has no idea where to find the error log and we need to resort to something like use CGI::Carp 'fatalsToBrowser', which isn’t exactly elegant.

A PSGI application is just a subroutine. So it’s trivial for testing tools to call the subroutine with the correct parameters. This makes testing PSGI programs really easy (and all of the tools to do this are part of the Plack distribution I mentioned above). Similarly, there are tools debugging a PSGI program far easier than the equivalent CGI program.

Deployment Flexibility

This, to me, is the big one. I talked earlier about the performance problems that the CGI environment leads to. You have a CGI program that is only used by a few people on your internal network. And that’s fine. The second or so it takes to respond to each request isn’t a problem. But it proves useful and before you know it, many more people start to use it. And then someone suggests publishing it to external users too. The one-second responses stretch to five or ten seconds, or even longer and you start getting complaints about the system. You know you should move it to a persistent environment like FastCGI or mod_perl, but that would require large-scale changes to the code and how are you ever going to find the time for that?

With a PSGI application, things are different. You can start by deploying your PSGI code in a CGI environment if you like (although, to be honest, it seems that very few people do that). Then when you need to make it faster, you can move it to FastCGI or mod_perl. Or you can run it as a standalone web service and configure your web proxy to redirect requests to it. Usually, you’ll be able to use exactly the same code in all of these environments.

In Conclusion

I know why people still write CGI programs. And I know why people still write them using CGI.pm – it’s what people know. It’s seen as the easy option. It’s what twenty-five years of web tutorials are telling them to do.

But in 2018 (and, to be honest, for most of the last ten years) that’s simply not the best approach to take. There are more powerful and more flexible options available.

Please don’t write code using CGI.pm. And please don’t write tutorials encouraging people to do that.

Categories
Programming

Fixing a Bug

I fixed a bug earlier this week. Ok, actually, I introduced a bug and then spent the next few hours tracking it down and fixing it – but that doesn’t sound quite so positive, does it?

I thought it might be interesting to talk you through the bug and the fix. I should point out that this is client code, so all identifiers have been changed to protect the innocent.

We had some code that was being called in a loop while building the response to an API call. Usually, the loop would have a few dozen iterations, but in a few cases it could have something like 6,000 iterations. And in those cases we wanted to speed things up. One of the approaches I took was to eliminate all unnecessary database calls from the code in the loop. So I set DBIC_TRACE and made a request. And saw about a dozen database queries for each iteration. They had to go, so I started looking at the code.

One line I saw, looked like this:

The $widgets variable ends up storing a reference to an array containing details of three commonly used widget types. But they are always the same three widget types. So we can cache this.

This is basically the same code, but we now only run one query against the database. So that’s a big win. There was one other step. It turned out I was building the same cache in two different subroutines, so I moved the cache to a package-level lexical variable.

I made a few more similar changes and then submitted the code for review by some other developers in the team before releasing it. One of them asked why I hadn’t used the existing Widget::Cache module instead of inventing my own cache. The answer was that I hadn’t heard of it but on investigation it seemed to be just what I wanted. As the server starts, this module reads details of all of the widgets and stores them in a hash. It ends up looking like this:

The module also exports helper functions which gave access to the various parts of the cache – things like get_widget_by_code() and get_widget_by_id(). I added a new subroutine called get_common_widgets() and called it from my code like this:

I also removed the package-level lexical variable that I had previously been using to store my widget cache.

At this point, I had introduced the bug. But it wasn’t found until a few hours later when the fix was in QA. The API response that I had been working on was still working fine, but a few other unrelated pages on the site had stopped working. And the error messages all came from functions in the Widget::Cache module. They all complained that the cache variable, $widgets, was being used as a hash reference when it was actually an array reference.

It was clear that I’d broken something. But what? Can you see it?

Eventually, I resorted to checking every piece of code that touched the $widgets widgets variable. And there were two places that set a variable called $widgets. One was the initialisation code that I hadn’t touched and the other was my code where I get the list of common widgets. Remember? It looks like this:

And get_common_widgets() returns an array reference – so that would certainly explain the behaviour I’m seeing. But this variable isn’t the cache variable from Widget::Cache. It’s not even in the same file. Oh. Wait!

Remember I said that Widget::Cache exports some helper functions that give access to parts of the cache? Well, it also exports the $widget variable itself. I guess that’s in case you want to access a bit of it that isn’t exposed by a helper function. But in that situation, you should do what I did – you should add a new helper function. You shouldn’t just go rummaging in the innards of a global variable. In fact you shouldn’t have access to that global variable at all.

So, the fix was simple.

We now have a local, lexical variable and we don’t go trashing a global variable that various parts of the codebase rely on. The next step would be to remove that dangerous export and check that all code that uses it is replaced with code that uses helper functions instead. But it’s a large codebase and I’m not sure I want to open that particular can of worms.

Global variables are a terrible idea. Only use them if it’s absolutely necessary (and, even then, you’re probably wrong about it being absolutely necessary).

So now you all know how I nearly broke a client’s site this week and how the problem was only caught at the last minute in QA. Remember this when I’m next asking you to employ me.

(And, yes, I know I make the codebase sound very “procedural”. Lots of it is, I’m afraid, and we just have to do the best with what we have.)