Fixing a Bug

I fixed a bug earlier this week. Ok, actually, I introduced a bug and then spent the next few hours tracking it down and fixing it – but that doesn’t sound quite so positive, does it?

I thought it might be interesting to talk you through the bug and the fix. I should point out that this is client code, so all identifiers have been changed to protect the innocent.

We had some code that was being called in a loop while building the response to an API call. Usually, the loop would have a few dozen iterations, but in a few cases it could have something like 6,000 iterations. And in those cases we wanted to speed things up. One of the approaches I took was to eliminate all unnecessary database calls from the code in the loop. So I set DBIC_TRACE and made a request. And saw about a dozen database queries for each iteration. They had to go, so I started looking at the code.

One line I saw, looked like this:

The $widgets variable ends up storing a reference to an array containing details of three commonly used widget types. But they are always the same three widget types. So we can cache this.

This is basically the same code, but we now only run one query against the database. So that’s a big win. There was one other step. It turned out I was building the same cache in two different subroutines, so I moved the cache to a package-level lexical variable.

I made a few more similar changes and then submitted the code for review by some other developers in the team before releasing it. One of them asked why I hadn’t used the existing Widget::Cache module instead of inventing my own cache. The answer was that I hadn’t heard of it but on investigation it seemed to be just what I wanted. As the server starts, this module reads details of all of the widgets and stores them in a hash. It ends up looking like this:

The module also exports helper functions which gave access to the various parts of the cache – things like get_widget_by_code() and get_widget_by_id(). I added a new subroutine called get_common_widgets() and called it from my code like this:

I also removed the package-level lexical variable that I had previously been using to store my widget cache.

At this point, I had introduced the bug. But it wasn’t found until a few hours later when the fix was in QA. The API response that I had been working on was still working fine, but a few other unrelated pages on the site had stopped working. And the error messages all came from functions in the Widget::Cache module. They all complained that the cache variable, $widgets, was being used as a hash reference when it was actually an array reference.

It was clear that I’d broken something. But what? Can you see it?

Eventually, I resorted to checking every piece of code that touched the $widgets widgets variable. And there were two places that set a variable called $widgets. One was the initialisation code that I hadn’t touched and the other was my code where I get the list of common widgets. Remember? It looks like this:

And get_common_widgets() returns an array reference – so that would certainly explain the behaviour I’m seeing. But this variable isn’t the cache variable from Widget::Cache. It’s not even in the same file. Oh. Wait!

Remember I said that Widget::Cache exports some helper functions that give access to parts of the cache? Well, it also exports the $widget variable itself. I guess that’s in case you want to access a bit of it that isn’t exposed by a helper function. But in that situation, you should do what I did – you should add a new helper function. You shouldn’t just go rummaging in the innards of a global variable. In fact you shouldn’t have access to that global variable at all.

So, the fix was simple.

We now have a local, lexical variable and we don’t go trashing a global variable that various parts of the codebase rely on. The next step would be to remove that dangerous export and check that all code that uses it is replaced with code that uses helper functions instead. But it’s a large codebase and I’m not sure I want to open that particular can of worms.

Global variables are a terrible idea. Only use them if it’s absolutely necessary (and, even then, you’re probably wrong about it being absolutely necessary).

So now you all know how I nearly broke a client’s site this week and how the problem was only caught at the last minute in QA. Remember this when I’m next asking you to employ me.

(And, yes, I know I make the codebase sound very “procedural”. Lots of it is, I’m afraid, and we just have to do the best with what we have.)