Categories
Programming

A Subtle Bug

Earlier this week, I saw this code being recommended on Stack Overflow. The code contains a nasty, but rather subtle bug. The version I saw has been fixed now, but I thought there were some interesting lessons to learn by looking at the problems in some detail.

Let’s start by working out what the bug is. Here’s the code:

On first glance, it seems fine. It uses the common “open or die” idiom. It uses the modern approach of using a lexical filehandle. It even uses the three-argument version of “open()”. Code like has appeared in huge numbers of Perl programs for years. What can possibly be the problem?

I’ll give you a couple of minutes to have a closer look and work out what you think the problem is.

[ … time passes … ]

So what do you think? Do you see what the problem is?

The problem is that there is no error checking.

“What do you mean, Dave?” I hear you say. “There’s error checking there – I can see it plainly.” Some of you might even be wondering if I’m going senile.

And, yes, it certainly looks like it checks for errors. But the error checking doesn’t work. Let me prove that to you. We can check it with a simple command line program.

You would expect to see the “die” message there. But it doesn’t appear. Ok, perhaps I’m lying. Perhaps I really do have a file called “not.there”. Let’s try another, slightly different, version of the code.

And there we see the error message. That file really doesn’t exist.

So what went wrong with the first version? Of course, a good way to start working that out is to compare the two versions and look at the differences between them. The difference here is that when I put parentheses around the parameters to “open()” it started working. And when you fix things by adding parentheses it’s a pretty sure bet that the problem comes down to precedence.

The order of precedence for Perl operators is listed in perldoc perlop. If you look at that list you’ll see that the “or” operator we used (“||”) is at position 16 on the list. But what other operators are we using in our code? The answer is lurking down at position 21 on the list. When we call a Perl built-in function without using parentheses around the parameters, it’s known as a list operator. And list operators have rather low precedence.

All of which means that our original code is actually parsed as if we had written it like this:

Notice the parentheses that have appeared around $file and (crucially) the whole “or die” clause. That means that the bracketed expression is evaluated and passed to “open()” as its third argument. And when Perl evaluates that expression, it does that clever “Boolean short-circuiting” thing. An expression of “A || B” evaluates A first and if that is true, it returns it. Only if A is false will it go on to evaluate B and return that. In our case, the filename will always be true (well, unless you have a file called “0”) so the second half of the expression (the “or die…” bit) is never evaluated and, effectively, ignored.

Which is why I said, back at the start, that this code has no error checking at all – that’s literally true as the error checking has no effect at all.

So how do we fix it? Well, we’ve already seen one approach – you can explicitly add parentheses around the arguments to “open()”. But Perl programmers don’t like to use unnecessary punctuation and I’m sure I’ve seen this written without parentheses, so how does that work?

If you take another look at the table of operator precedence and look down below the list operators, you’ll see another “or” operator (the one that’s actually the word “or”, rather than punctuation). It’s right at the bottom of the list – at position 24. And that means we can use that version without needing the parentheses around the parameters to “open()”.

And that’s the version that you’ll see in most codebases. But, as we’ve seen, it’s vitally important to use the correct version of the “or” operator.

The worst thing about this bug is that it appears at the worst time. If your file exists and you can open it successfully, then everything works fine. Things only go wrong when… well, when things go wrong. If you can’t open your file for some reason, you won’t know about it. Which is bad.

So it’s important to test that your code works correctly when things go wrong. And that’s why we have modules like Test::Exception. You could write a test program like this:

And it would fail every time. But if you switched to the other “or” operator, it will work.

There’s one other approach you can take. You can use autodie in your code and just forget about adding “or die” to any of your calls to “open()”.

This is an easy bug to introduce into your code and a hard one to track down. Who’s confident that it doesn’t appear in any of their code?

Categories
Conferences

London Perl Workshop 2018

Last Saturday was the London Perl Workshop and (as has become traditional), I’m going to tell you how much fun it was so that you feel jealous that you missed it and make more of an effort to come along next year.

This year was slightly different for me. For various reasons, I didn’t have the time to propose and prepare any talks for the workshop so (for the first time ever) I decided I’d just go to the workshop and not give any talks. It very nearly worked too.

I arrived at about 9am, checked in at the registration desk and collected my free t-shirt. Then I went upstairs to the main lecture theatre to see Julian giving the organisers’ welcome talk. Julian is a new member of the organising committee, having only moved to London in the last year. But he’s certainly thrown himself into the community.

Following the welcome talk, I stayed in the main room to hear Léon Brocard explaining what’s new in the world of HTTP. It seems the HTTP3 is just around the corner and while it’s a lot more complicated than previous versions it has the potential to make the web a lot faster. I stayed in the same room to hear Martin Berends talking about Cucumber. I’ve been meaning to look at Cucumber in more detail for some years – perhaps this talk will be the prod I need.

Things were running a little late in the main room by this point, so I was a little late getting to Simon Proctor‘s 24 uses for Perl6‎. I try to get to at least one Perl 6 talk at every conference I go to. And this time, I was galvanised enough to buy a copy of Learning Perl 6 for my Kindle.

I caught up with a few friends over the coffee break and then headed back to the main room to see Saif Ahmed explaining Quick and Dirty GUI Applications (and why you should make them)‎. This was nostalgic for me. Almost twenty years ago at an OSCON in California, I remember a long, drunken night when some of us sketched out a plan to build a framework-agnostic GUI toolkit for Perl (like a DBI for GUIs). I think we gave up when we realised we would need to call it “PUI”. Anyway, it seems that Saif (who was keen to make it very clear that he’s not a professional programmer) has written such a tool.

After that I went to see my former colleague Andrew Solomon talking about ‎HOWTO: grow the Perl team. The talk was based around his experiences helping various companies training up Perl programmers using his Geek Uni site.

And then it was lunchtime. I met up with a few other former London Perl Mongers leaders and we had some very nice pizzas just over the road from the workshop venue. Over lunch, we claimed to be preparing for our afternoon panel discussion – but really we were mainly just reminiscing.

After lunch, it was back to the main room to see Peter Allan’s talk on Security Checks using perlcritic and Tom Hukins on Contrarian Perl‎. Both talks were the kind of thing that really makes you think. Peter’s talk introduced some interesting ideas about pushing perlcritic further than it is usually pushed. And Tom pointed out that in order to write the best code possible, you might need to move beyond the generally accepted “industry standards”.

After that, there was a brief visit to a different room to hear Mohammed Anwar talking about The power of mentoring‎. Mohammed is another recent newcomer to the Perl community and, like Julien, he is certainly making a difference.

I skipped the coffee break and went back to the main room to prepare for the one session that I had been roped into contributing to – ‎”I’m a Former London.PM Leader – Ask Me Anything”‎. We had gathered together as many of the former London Perl Mongers leaders and we took questions from the audience about the past, present and future of the group. I was slightly worried that it might tip over into nostalgic self-indulgence, but several people later told me that they had really enjoyed it.

Then it was time for the lightning talks (which were as varied and entertaining as always) and Julien’s “thank-you” talk. Like last year, the post-conference started in the venue before moving on to a pub. I stayed for an hour or so, chatting to friends, before making my way home.

As always, I’d like to thank all of the organisers, speakers, sponsors and attendees for making the workshop as successful as it (always!) is.

Here’s a list of those sponsors. They’re nice companies:

Hope to see you at next years workshop.

Categories
Programming

Please Don’t Use CGI.pm

Earlier this week, the Perl magazine site, perl.com, published an article about writing web applications using CGI.pm. That seemed like a bizarre choice to me, but I’ve decided to use it as an excuse to write an article explaining why I think that’s a really bad idea.

It’s important to start by getting some definitions straight – as, often, I see people conflating two or three of these concepts and it always confuses the discussion.

  • The Common Gateway Interface (CGI) is a protocol which defines one way that you can write applications that create dynamic web pages. CGI defines the interface between a web server and a computer program which generates a dynamic page.
  • A CGI program is a computer program that is written in a manner that conforms to the CGI specification. The program optionally reads input from its environment and then prints to STDOUT a stream of data representing a dynamic web page. Such programs can be (and have been!) written in pretty much any programming language.
  • CGI.pm is a CPAN module which makes it easier to write CGI programs in Perl. The module was included in the Perl core distribution from Perl 5.004 (in 1997) until it was removed from Perl 5.22 (in 2015).

A Brief Introduction to CGI.pm

CGI.pm basically contained two sets of functions. One for input and one for output. There was a set for reading data that was passed into the program (the most commonly used one of these was param()) and a set for producing output to send to the browser. Most of these were functions which created HTML elements like <h1> or <p>. By about 2002, most people seemed to have worked out that these HTML creation functions were a bad idea and had switched to using a templating engine instead. One output function that remained useful was header() which gave the programmer an easy way to create the various headers required in an HTTP response – most commonly the “Content-type” header.

For at least the last ten years that I was using CGI.pm, my programs included the line:

as it was only the param() and header() functions that I used.

I should also point out that there are two different “modes” that you can use the module in. There’s an object-oriented mode (create an object with CGI->new and interact with it through methods) and a function-based mode (just call functions that are exported by the module). As I never needed more than one CGI object in a program, I always just used the function-based interface.

Why Shouldn’t We Use CGI.pm Today?

If you’re using CGI.pm in the way I mentioned above (using it as a wrapper around the CGI protocol and ignoring the HTML generation functions), then it’s not actually a terrible way to write simple web applications. There are two problems with it:

  1. CGI programs are slow. They start up a Perl process for each request to the CGI URL. This is, of course, a problem with the CGI protocol itself, not the CGI.pm module. This might not be much of a problem if you have a low-traffic application that you want to put on the web.
  2. CGI.pm gives you no help building more complicated features in a web application. For example, there’s no built-in support for request routing. If your application needs to control a number of URLs, then you either end up with a separate CGI program for each URL or you shoe-horn them all into the same program and set up some far-too-clever mod_rewrite magic. And everyone reinvents the same wheels.

Basically, there are better ways to write web applications in Perl these days. It was removed from the Perl code distribution in 2015 because people didn’t want to encourage people to use an outdated technology.

What are these better methods? Well, anything based on an improved gateway interface specification called the Perl Server Gateway Interface (PSGI). That could be a web framework like Dancer2, Catalyst or Web::Simple or you could even just use raw PSGI (by using the toolkit in the Plack distribution).

Often when I suggest this to people, they think that the PSGI approach is going to be far more complex than just whipping up a quick CGI program. And it’s easy to see why they might think that. All too often, an introduction to PSGI starts by building a relatively powerful (and, therefore, complicated) web application using Catalyst. And while Catalyst is a fine web framework, it’s not the simplest way to write a basic web application.

But it doesn’t need to be that way. You can write PSGI programs in “raw PGSI” without reaching for a framework. Sure, you’ll still have the problems listed in my point two above, but when you want to address that, you can start looking at the various web frameworks. Even so, you’ll have three big benefits from moving to PSGI.

The Benefits of PSGI

As I see it, there are three huge benefits that you get from PSGI.

Software Ecosystem

The standard PSGI toolkit is called Plack. You’ll need to install that. That will give you adapters enabling you to use PSGI programs in pretty much any web deployment environment. It also includes a large number of plugins and extensions (often called “middleware”) for PSGI. All of this software can be added to your application really simply. And any bits of your program that you don’t have to write is always a big advantage.

Testing and Debugging

How do you test your CGI program? Probably, you use something like Selenium (or, perhaps, just LWP) to fire requests at the server and see what results you get back.

And how about debugging any problems that your testing finds? All too often, the debugging that I see is warn() statements written to the web server error log. Actually, when answering questions on StackOverflow, often the poster has no idea where to find the error log and we need to resort to something like use CGI::Carp 'fatalsToBrowser', which isn’t exactly elegant.

A PSGI application is just a subroutine. So it’s trivial for testing tools to call the subroutine with the correct parameters. This makes testing PSGI programs really easy (and all of the tools to do this are part of the Plack distribution I mentioned above). Similarly, there are tools debugging a PSGI program far easier than the equivalent CGI program.

Deployment Flexibility

This, to me, is the big one. I talked earlier about the performance problems that the CGI environment leads to. You have a CGI program that is only used by a few people on your internal network. And that’s fine. The second or so it takes to respond to each request isn’t a problem. But it proves useful and before you know it, many more people start to use it. And then someone suggests publishing it to external users too. The one-second responses stretch to five or ten seconds, or even longer and you start getting complaints about the system. You know you should move it to a persistent environment like FastCGI or mod_perl, but that would require large-scale changes to the code and how are you ever going to find the time for that?

With a PSGI application, things are different. You can start by deploying your PSGI code in a CGI environment if you like (although, to be honest, it seems that very few people do that). Then when you need to make it faster, you can move it to FastCGI or mod_perl. Or you can run it as a standalone web service and configure your web proxy to redirect requests to it. Usually, you’ll be able to use exactly the same code in all of these environments.

In Conclusion

I know why people still write CGI programs. And I know why people still write them using CGI.pm – it’s what people know. It’s seen as the easy option. It’s what twenty-five years of web tutorials are telling them to do.

But in 2018 (and, to be honest, for most of the last ten years) that’s simply not the best approach to take. There are more powerful and more flexible options available.

Please don’t write code using CGI.pm. And please don’t write tutorials encouraging people to do that.

Categories
Programming

Fixing a Bug

I fixed a bug earlier this week. Ok, actually, I introduced a bug and then spent the next few hours tracking it down and fixing it – but that doesn’t sound quite so positive, does it?

I thought it might be interesting to talk you through the bug and the fix. I should point out that this is client code, so all identifiers have been changed to protect the innocent.

We had some code that was being called in a loop while building the response to an API call. Usually, the loop would have a few dozen iterations, but in a few cases it could have something like 6,000 iterations. And in those cases we wanted to speed things up. One of the approaches I took was to eliminate all unnecessary database calls from the code in the loop. So I set DBIC_TRACE and made a request. And saw about a dozen database queries for each iteration. They had to go, so I started looking at the code.

One line I saw, looked like this:

The $widgets variable ends up storing a reference to an array containing details of three commonly used widget types. But they are always the same three widget types. So we can cache this.

This is basically the same code, but we now only run one query against the database. So that’s a big win. There was one other step. It turned out I was building the same cache in two different subroutines, so I moved the cache to a package-level lexical variable.

I made a few more similar changes and then submitted the code for review by some other developers in the team before releasing it. One of them asked why I hadn’t used the existing Widget::Cache module instead of inventing my own cache. The answer was that I hadn’t heard of it but on investigation it seemed to be just what I wanted. As the server starts, this module reads details of all of the widgets and stores them in a hash. It ends up looking like this:

The module also exports helper functions which gave access to the various parts of the cache – things like get_widget_by_code() and get_widget_by_id(). I added a new subroutine called get_common_widgets() and called it from my code like this:

I also removed the package-level lexical variable that I had previously been using to store my widget cache.

At this point, I had introduced the bug. But it wasn’t found until a few hours later when the fix was in QA. The API response that I had been working on was still working fine, but a few other unrelated pages on the site had stopped working. And the error messages all came from functions in the Widget::Cache module. They all complained that the cache variable, $widgets, was being used as a hash reference when it was actually an array reference.

It was clear that I’d broken something. But what? Can you see it?

Eventually, I resorted to checking every piece of code that touched the $widgets widgets variable. And there were two places that set a variable called $widgets. One was the initialisation code that I hadn’t touched and the other was my code where I get the list of common widgets. Remember? It looks like this:

And get_common_widgets() returns an array reference – so that would certainly explain the behaviour I’m seeing. But this variable isn’t the cache variable from Widget::Cache. It’s not even in the same file. Oh. Wait!

Remember I said that Widget::Cache exports some helper functions that give access to parts of the cache? Well, it also exports the $widget variable itself. I guess that’s in case you want to access a bit of it that isn’t exposed by a helper function. But in that situation, you should do what I did – you should add a new helper function. You shouldn’t just go rummaging in the innards of a global variable. In fact you shouldn’t have access to that global variable at all.

So, the fix was simple.

We now have a local, lexical variable and we don’t go trashing a global variable that various parts of the codebase rely on. The next step would be to remove that dangerous export and check that all code that uses it is replaced with code that uses helper functions instead. But it’s a large codebase and I’m not sure I want to open that particular can of worms.

Global variables are a terrible idea. Only use them if it’s absolutely necessary (and, even then, you’re probably wrong about it being absolutely necessary).

So now you all know how I nearly broke a client’s site this week and how the problem was only caught at the last minute in QA. Remember this when I’m next asking you to employ me.

(And, yes, I know I make the codebase sound very “procedural”. Lots of it is, I’m afraid, and we just have to do the best with what we have.)

Categories
Conferences

The Perl Conference in Glasgow

Yesterday (despite the best efforts of Virgin Trains to stop me) I came home from The Perl Conference in Glasgow. I had a great week up in Glasgow, and I thought I’d better write about it before I forgot anything important.

Pre-Conference

I arrived on Sunday evening. This was the last day of the European Championships which were jointly hosted in Glasgow and Berlin. As I was checking into my hotel, the receptionist happened to mention that there was some celebration of the championships in George Square so, once I had unpacked, I went off to explore. And I found a free festival with lots of great music. It had been going all day, so I only got to see the last couple of acts (Fatoumata Diawara and Shooglenifty) but it was a great way to spend my first couple of hours in the City.

The following day, I had agreed to meet Andrew Solomon (of Geek Uni) at the conference venue so we could see what the place was like before our workshops on Tuesday. Having done that, we went off to explore the city a bit. Following lunch (which was at the excellent Pizza Punks) we retired to our respective hotel rooms to ensure that our workshops were ready. That, at least, was the plan. When I got back to my hotel, I found that my room hadn’t been cleaned, so I set out for another walk. This time I found what’s left of the Glasgow School of Art and King Tut’s.

Workshops

On Tuesday, I spent the day at the venue, running two half-day workshops. In the morning I introduced a smallish class to the joys of on-page SEO and in the afternoon a slightly larger class sat through my rather experimental “The Professional Programmer” workshop. This was a quick look at some of the things that a professional programmer needs to understand other than programming itself. Both workshops seemed to go pretty well and I’m looking forward to reading the feedback.

That evening, the pre-conference social was held in the venue, so following my workshop, I just had to wander upstairs to meet loads of old friends. Much good food and conversation was enjoyed over the following few hours.

Day 1

Traditionally, the first order of business on the first day of the conference is the announcement of next year’s venue. Thomas of the YAPC Europe Foundation announced that we’ll be going back to Riga in 2019. I’m already looking forward to it.

Then the talks started. Ruth Holloway’s keynote “Discourse Without Drama” proposed a world where we can talk about things (even disagree about things) without every conversation ending in anger and shouting. I’m sure it’s a world that most of us would love to see.

For the rest of the morning, I saw Salve Nilsen talking about the state of Perl Mongering in Europe, Choroba discussing the inconsistencies in type handling between the different versions of various JSON libraries and Leon Timmermans explaining how the Perl 5 Porters have handled language design in recent years. For the last slot of the morning, I saw Makoto Nazaki give an update on what is happening in The Perl Foundation.

After a really good lunch (the venue staff were good at many things – catering was top of the list) I saw Curtis Poe talking about the best ways to sell a legacy code clean-up project to your managers. This was followed by Ben Edwards explaining how Pirum keep their CVS and Git code repositories in step.

The conference day ended with the lightning talks. This included me giving an overview of the twenty year history of London Perl Mongers in five minutes. I think it worked.

Then I made a mistake. The conference dinner was in a restaurant about a forty minute walk or twenty minute taxi ride away. I was due there at about 7:30pm. I went back to my hotel room and got there at about 6pm and laid on my bed for a few minutes. When I woke up and looked at the time, it was 8pm. I could have rushed around and made a late appearance at the dinner, but I decided to take note of what my body obviously wanted and had a quiet night in.

Day 2

Feeling refreshed after a long sleep, I attacked day two of the conference. I started by listening to Thomas Klausner explaining how he has written an asynchronous web application without using any of the usual frameworks. It looked interesting and I’ll be investigating his code in more detail. Following that, I saw André Walker talking about how error messages can be so much easier to understand. He had an example from a module that he had written. It did look good. I then saw Mohammed Anwar encouraging people to follow his lead and submit pull requests to CPAN modules. As someone who has been on the receiving end of many of Mohammed’s pull requests, I can only agree that I’d love to see more people sending me improvements to my code. I try to see at least one Perl 6 talk at every conference I go to and this time I chose my former colleague, Simon Proctor, explaining function signatures, multi-methods and things like that.

I didn’t see much of lunch on Thursday as Andrew Solomon had arranged a “Birds of a Feather” session on “Growing a Perl Team”. A large group of like-minded people (but from many different parts of the industry) talked about the problems they have attracting and keeping good Perl developers. I’m not sure we came up with a clear way forward, but it was certainly good to share ideas.

The afternoon started with Mark Fowler talking though the entries from last year’s Perl Advent Calendar (and asking us to propose articles for this year’s calendar) and I followed that with Andrew Solomon telling us about his experience of running on-the-job Perl training for various companies. Following a coffee break, I saw Tom Hukins explaining what WebDriver is and how to using it with Perl. After that was my talk about the Line of Succession web site. I didn’t get a huge audience, but those that were there seemed to enjoy it.

Then there was a slightly different session. Every year, Larry Wall goes to lots of Perl conferences. And his wife, Gloria, always comes with him. Normally, Larry gives a keynote and Gloria watches from the audience. When he was organising this conference, Mark Keating decided to turn that on its head. He didn’t invite Larry to speak and, instead, asked Gloria to talk to the conference. So we had fifty minutes of Gloria Wall in conversation with Curtis Poe. And it was a really interesting conversation. I recommend you look for the video.

Once again, the day ended with an interesting selection of lightning talks.

Day 3

Friday was good. Friday was the day that I wasn’t speaking. And that meant that Friday was the day I didn’t need to carry my laptop around all day.

I started by watching Wesley Schwengle talking about using Docker with Perl. I keep going to talks like this. Every time I get that little bit closer to understanding how Docker is going to make my life easier. Then I switched to something far less technical and saw Joelle Maslak explaining how mistakes that we all make can make our applications less usable for many people. I followed that with Lee Johnson introducing Burp Scanner and explaining how it finds insecurities in web applications and Ruth Holloway talking about accessibility at conferences and events. I think the people who attended it all found it interesting – it’s a shame that, as far as I could see, not many of them were event organisers.

After lunch I watched Matt Trout introduce Babble, a Perl module that can help you deal with syntax differences between different versions of Perl. Then the final keynote was Curtis Poe talking about some of the possible futures for Perl 5 and 6. After that, there was just another great selection of lightning talks before Mark Keating closed the conference by thanking everyone.

I couldn’t stay around for the post-conference goodbyes as I had to get to Edinburgh to see Amanda Palmer in concert. That was excellent too.

 

When I heard that Mark would be organising the conference, I knew we’d be in safe hands. Mark has plenty of experience of this and he’s great at it. Of course, he had a great team working with him too and I think it really helped that he chose a professional conference venue to host it.

So huge thanks to Mark and his team. But thanks, also, to all of the speakers and the sponsors. I’m sure all the attendees will agree that this year’s conference was outstanding.

See you all in Riga.