The Joy of Prefetch

If you heard me speak at YAPC or you’ve had any kind of conversation with me over the last few weeks then it’s likely you’ve heard me mention the secret project that I’ve been writing for my wife’s school.

To give you a bit of background, there’s one afternoon a week where the students at the school don’t follow the normal academic timetable. On that afternoon, the teachers all offer classes on wider topics. This year’s topics include Acting, Money Management and Quilt-Making. It’s a wide-ranging selection. Each student chooses one class per term.

This year I offered to write a web app that allowed the students to make their selections. This seemed better than the spreadsheet-based mechanisms that have been used in the past. Each student registers with their school-based email address and then on a given date, they can log in and make their selections.

I wrote the app in Dancer2 (my web framework of choice) and the site started allowing students to make their selections last Thursday morning. In the run-up to the go-live time, Google Analytics showed me that about 180 students were on the site waiting to make their selections. At 7am the selections part of the site went live.

And immediately stopped working. Much to my embarrassment.

It turned out that a disk failed on the server moments after the site went live. It’s the kind of thing that you can’t predict.But it leads to lots of frustrated teenagers and doesn’t give a very good impression.

To give me time to rebuild and stress-test the site we’ve decided to relaunch at 8pm this evening. I’ve spent the weekend rebuilding the app on a new (and more powerful) server.

I’m pretty sure that the timing of the failure was coincidental. I don’t think that my app caused the disk failure. But a failure of this magnitude makes you paranoid, so I spent a lot of yesterday tuning the code.

The area I looked at most closely was the number of database queries that the app was making. There are two main actions that might be slow – the page that builds the list of courses that a student can choose from and the page which saves a student’s selections.

I started with the first of these. I set DBIC_TRACE to 1 and fired up a development copy of the app. I was shocked to see the app run about 120 queries – many of which were identical.

Of course I should have tested this before. And, yes, it’s an idiotic way to build an application. But I’m afraid that using an ORM like DBIx::Class can make it all too easy to write code like this. Fortunately, it makes it easy to fix it too. The secret is “prefetch”.

“Prefetch” is an option you can pass to the the “search” method on a resultset. Here’s an example of the difference that can make.

There are seven year groups in a British secondary school. Most schools call them Year 7 to Year 13 (the earlier years are in primary school). Each year group will have a number of forms. So there’s a one to many relationship between years and forms. In database terms, the form table holds a foreign key to the year table. In DBIC terms, the Year result class has a “has_many” relationship with the Form result class and the Form result class has a “belongs_to” relation with the Year result class.

A naive way to list the years and their associated forms would look like this:

Run code like that with DBIC_TRACE turned on and you’ll see the proliferation of database queries. There’s one query that selects all of the years and then for each year, you get another query to get all of its associated forms.

Of course, if you were writing raw SQL, you wouldn’t do that. You’d write one query that joins the year and form tables and pulls all of the data back at once. And the “prefetch” option gives you a way to do that in DBIC as well.

All we have done here is to interpose a call to “search” which adds the “prefetch” option. If you run this code with DBIC_TRACE turned on, then you’ll see that there’s only one database query and it’ll be very similar to the raw SQL that you would have written – it brings back the data from both of the tables at the same time.

But that’s not all of the cleverness of the “prefetch” option. You might be wondering what the difference is between “prefetch” and the rather similar-sounding “join” option. Well, with “join” the columns from the joined table would be added to your main table’s result set. This would, for example, create some kind of mutant Year resultset object that you could ask for Form data using calls like “get_column(‘forms.name’)”. [Update: I was trying to simplify this explanation and I ended up over-simplifying to the point of complete inaccuracy – joined columns only get added to your result set if you use the “columns” or “select/as” attributes. And the argument to “get_column()” needs to be the column name that you have defined using those options.] And that’s useful sometimes, but often I find it easier to use “prefetch” as that uses the data from the form table to build Form result objects which look exactly as they would if you pulled them directly from the database.

So that’s the kind of change that I made in my code. By prefetching a lot of associated tables I was able to drastically cut down the number of queries made to build that course selection page. Originally, it was about 120 queries. I got it down to three. Of course, each of those queries is a lot larger and is doing far more work. But there’s a lot less time spent compiling SQL and pulling data from the database.

The other page I looked at – the one that saves a student’s selections – wasn’t quite so impressive. Originally it was about twenty queries and I got it down to six.

Reducing the number of database queries is a really useful way to make your applications more efficient and DBIC’s “prefetch” option is a great tool for enabling that. I recommend that you take a close look at it.

After crowing about my success on Twitter I got a reply from a colleague pointing me at Test::DBIC::ExpectedQueries which looks like a great tool for monitoring the number of queries in your app.

Beginners Perl Tutorial

A few weeks ago I got an interesting email from someone at Udemy. They were looking for someone to write a beginners Perl tutorial that they would make available for free on their web site. I think I wasn’t the only person that they got in touch with but, after a brief email conversation, they asked me to go ahead and write it.

It turned out to be harder that I thought it would be. I expected that I could write about 6,000 words over a weekend. In the end it took two weekends and it stretched to over 8,000 words. The problem is not in the writing, it’s in deciding what to omit. I’m sure that if you read it you’ll find absolutely essential topics that I haven’t included – but I wonder what you would have dropped to make room for them.

But eventually I finished it, delivered it to them (along with an invoice – hurrah!) and waited to hear that they had published it.

Yesterday I heard that it was online. Not from Udemy (they had forgotten to tell me that it was published two weeks ago) but from a friend.

Unfortunately, some gremlins had crept in at some point during their publication pipeline. Some weird character substitutions had taken place (which had disastrous consequences for some of the Perl code examples) and a large number of paragraph breaks had vanished. But I reported those all to Udemy yesterday and I see they have all been fixed overnight.

So finally I can share the tutorial with you. Please feel free to share it with people who might find it useful.

Although it’s 8,000 words long, it really only scratches the surface of the language. Udemy have added a link to one of their existing Perl courses, but unfortunately it’s not a very good Perl course (Udemy don’t seem to have any very good Perl courses). I understand why they have done that (that is, after all, the whole point of commissioning this tutorial – to drive more people to pay for Perl courses on tutorial) but it’s a shame that there isn’t anything of higher quality available.

So there’s an obvious hole in Udemy’s offerings. They don’t have a high quality Perl course. That might be a hole that I try to fill when I next get some free time.

Unless any other Perl trainers want to beat me to it.

Oh, and please let me know what you think of the tutorial.

Penetration Testing with Perl

I was sent a review copy of Penetration Testing with Perl by Douglas Berdeaux. I really didn’t like it. Here’s the review I’ve been sharing on Amazon and Goodreads.

I’ve been wanting to learn a bit about Penetration Testing for a while and as Perl is my programming language of choice this seemed like a great book to choose. Unfortunately, it wasn’t.

I have no doubt that the author knows what he is talking about when it comes to Penetration Testing, but there were several things that prevented this book from transferring much of that knowledge to me.

Firstly, the typesetting in the book is terrible. I was reading the Amazon eBook edition – it’s possible that the printed version is better. For example, there’s a lot of code in this book and it’s in a proportional font. In order for code to be readable, it needs to be in a fixed-width font. Also, there are two or three places where an equation appears in the text, but it appears in an unreadably small font. I have just checked in the PDF version of the book and neither of these problems appear there. It would seem that this is down to a problem in Packt’s eBook creation process.

Secondly, it’s obvious that English is not the author’s first language. At times this really prevents him from getting his point across clearly. I’m very happy to see non-native speakers publishing books in English. But the publishers need to provide high quality proof-readers to ensure that the language is good enough.

Thirdly, the organisation of the book is a little haphazard. The first couple of chapters are introductions to Perl and Linux, but after that we are dropped immediately into a discussion of network sniffing. Later in the book there are chapters on intelligence gathering, social engineering and password cracking. These are all far simpler topics which could have served as a gentle introduction to the book, getting people up to speed on Perl before delving into the complex internals of network packets. Once again, I think this should be the responsibility of the publisher. There should be a good editor working on the book alongside the author and shaping the manuscript so that the story it tells guides the user through the subject as easily as possible.

Finally, the book falls short in its technical content. I can’t comment on the author’s explanations of Penetration Testing (I was, after all, reading the book to learn about that topic), but the Perl code that he uses throughout the book is really bad. He is obviously someone who only ever learned enough Perl to get his job done and never bothered to learn how Perl really works or to keep his knowledge up to date. As a result, the book is full of the kind of code that gives Perl its reputation as a write-only language. The idioms that he uses are often out of date (using ‘-w’ instead of ‘use warnings’, for example), confusing (predeclaring subroutines unnecessarily, using ampersands on function calls) or just plain wrong (‘my ($x, $y, $z) = 0 x 3’ just doesn’t do what he thinks it does). Actually, it’s worse than that. It’s not just Perl he doesn’t understand, it’s the fundamentals of good software engineering. His code is a confusing mess of global variables and bad design. This is another failure by the publisher. There should have been a competent technical editor checking this stuff.

I’ve read four or five Packt books now. They’re all of this standard. None of them should have been published. But Packt seem to have hit on a good business model. They find unknown authors and produce books as cheaply as possible. Their publishing process omits all of the editing and checking that more reputable publishers use. The books that come out of this process are, of course, terrible. But, for reasons I can’t understand, people still buy them.

Packt books – just say no.

Upcoming Training

I have a few training courses coming up in the next few weeks which I thought you might be interested in.

Firstly, the London Perl Workshop is on 8th November. I’ll be giving a two hour talk on “Perl in the Internet of Things“. As always, the workshop is free, but please register on the site and star my talk if you’re planning on attending.

Then the week after I’m running two two-day courses in conjunction with FLOSS UK. On Tuesday 11th and Wednesday 12th it’s “Intermediate Perl” and on Thursday 13th and Friday 14th it’s “Advanced Perl Techniques”. Full details and a booking for are on the FLOSS UK web site.

Note: If you’re interested in the FLOSS UK courses, then please don’t pay the eye-watering non-member price (£720!) Simply join FLOSS UK (which costs £42) and then pay the member price of £399.

Hope to see you at one of this courses.

Perl’s Problems

It’s been over six weeks since I wrote my blog post on Perl usage. I really didn’t mean to leave it so long to write the follow-up. But real life intervened and I haven’t had time for much blogging. That’s still the case (I should be writing a talk right now) but I thought it was worth jotting down some quick notes about what I think is causing Perl’s decline.

Reputation

We have a lot to thank Matt Wright for. And I don’t mean that sarcastically. A lot of the popularity of Perl in the mid-90s stems directly from people like Matt and Selena Sol making their collections  of CGI programs available really early on. The popularity of their programs made Perl the de-facto standard for CGI programming.

But that was a double-edged sword. People searching the web for examples of CGI programming found Matt or Selena’s code and assumed they represented best practice. Which, of course, they didn’t. While people were blithely copying Matt’s programming style, good Perl programmers were using CGI.pm to parse their incoming parameters and separating their HTML generation out into templates.

In my previous post, I mentioned that fifteen or twenty years ago Perl was the programming language of choice for internet start-ups. That’s true, but a lot of the code written at that time was in the Matt Wright style. Matt’s style just about works for a guestbook or a form mailer. But when you try to build a business on top of code like that, it quickly becomes obvious that it’s an unmaintainable mess.

Many of the technical architects and CTOs who are making decisions about technology in companies today are the programmers who spent too many late nights battling those balls of mud in the 1990s. They were never really Perl programmers, they were only using it because it was fashionable, and they haven’t been keeping up with recent advances in Perl so it’s not surprising that they often choose to avoid using Perl.

Complexity

A lot of Perl’s reputation as executable line noise is completely unwarranted. The people who were writing those 1990s balls of mud were under such pressure to deliver that they would have almost certainly delivered something just as unmaintainable whatever language they were using. But some of that reputation is fair. I’ve been teaching Perl for almost fifteen years and I know that there are some parts of Perl that people find confusing. Here are some examples:

Sigils – I can explain things like @array, $array[$key] and even @array[@keys] to people. And most of them get it. But it takes them a while. And then it all goes to pieces again when I have to explain the difference between $array[$key] and $array->[$key].

Context – Does any other programming language have the concept of context? Yes, when used correctly it’s a powerful tool. But it’s hard to explain and a good source of hard-to-find bugs. Can anyone honestly say that they haven’t been bitten by a context bug at some point in the last years?

Data Structures – Is the difference between arrays and array references really necessary? Think of all the complexity that is added because you can’t just pass arrays and hashes into subroutines without being bitten by list flattening. As experienced Perl programmers we know the problems and our brains are hard-wired to work around it. But other languages treat all aggregate data structures as references and it all becomes a lot easier.

I know that each of these features (and half a dozen other examples I could list) makes Perl a richer and more expressive language. But this comes at the cost of learnability and readability. Perhaps that trade-off once seemed like a good idea. When you’re trying to encourage people to look at your language then the advantages seem less obvious.

Of course, none of these features can be changed as they would break pretty much every existing Perl codebase. Which would be a terrible idea. But you can get away with a lot more breakage when you increase your major version number. Which Perl hasn’t been able to do for fourteen years.

Perl 6

I need to be clear here. I think that Perl 6 looks like a great language. I am really looking forward to using on production systems. And it looks like the current Perl 6 team are doing great work towards making that possible. In fact I think that our best approach to reviving Perl’s fortunes is to get a production-ready version of Perl 6 out and to make a big noise about that.

However, that name has been a big problem.

Looking from outside the Perl echo chamber, it’s easy to believe that Perl hasn’t had a major release for twenty years. And that can probably explain a lot of Perl’s current problems.

I know that people who believe that are wrong. The current version of Perl (5.20.1 as I write this) is a lot different to the version that was current when Perl 6 was first announced (which was 5.6.0, I think). Perl has gone through huge changes in the last fourteen years. But the version number hides that.

I also know that we no longer tell people that Perl 6 is the next version of Perl. The Wikipedia page makes it clear in its first sentence that “Perl 6 is a member of the Perl family of programming languages“. So why do people continue to think it’s the next version of Perl? Well, probably because people assume that they know how software version numbers work and don’t bother to check the web site to see it a particular project has changed the standard meaning that has worked well for decades.

So Perl 6 has been simultaneously both good and bad for Perl. Good because a lot of Perl 6 ideas have been backported into Perl 5. But bad because Perl 5 has been unable to change its major version number in order to advertise these improvements to the wider software-using world.

Nothing can be done about this now. The damage is done. As I said at the start of this section, it’s likely that the only thing we can do is to bet heavily on Perl 6 and get it out as soon as possible. Perl 5 will continue to exist. People will continue to maintain and improve it. Some companies will continue to use it. But it’s usage will continue to fall. I really think it’s too late to do anything about that.