Category Archives: Programming

Dots and Perl

I was running a training course this week, and a conversation I had with the class reminded me that I have been planning to write this article for many months.

There are a number of operators in Perl that are made up of nothing but dots. How many of them can you name?

There are actually five dot operators in Perl. If the people on my training courses are any guide, most Perl programmers can only name two or three of them. Here’s a list. It’s in approximate order of how well-known I think the operators are (which is, coincidentally, also the order of increasing length).

One Dot

Everyone knows the one dot operator, right? It’s the concatenation operator. It converts its two operands to strings and concatenates them.

# $string contains 'one stringanother string'
my $string = 'one string' . 'another string';

It’s also sometimes useful to use it to force scalar context onto an expression. Consider the difference between these two statements.

say "Time: ", localtime;
say "Time: " . localtime;

The difference between the two statements is tiny, but the output is very different.

There’s not much more to say about the single dot.

Two Dots

Things start to get a little more interesting when we look at two dots. The two dot operator is actually two different operators depending on how it is used. Most people know that it can be used to generate a range of values.

my @numbers = (1 .. 100);
my @letters = ('a' .. 'z');

You can also use it in something like a for loop. This is an easy way to execute an operation a number of times.

do_something() for 1 .. 3;

In older versions of Perl, this could potentially eat up a lot of memory as a temporary array was created to contain the range and therefore something like

do_something() for 1 .. 1000000;

could be a problem. But in all modern versions of Perl, that temporary array is not created and the expression causes no problems.

Two dots acts as a range operator when it is used in list context. In scalar context, its behaviour is different. It becomes a different operator – the flip-flop operator.

The flip-flop operator is so-called because it flip-flops between two states. It starts as returning false and continues to do so until something “flips” it into its true state. It then continues to return true until something else “flops” it back into a false state. The cycle then repeats.

So what causes it to flip-flop between its two states? It’s the evaluation of its left and right operands. Imagine you are processing a file that contains text that you are interested in and other text that you can ignore. The start of a block that you want to process is marked with a line containing “START” and the end of the block is marked with “END”. Between and “END” marker and the next “START”, there can be lots of text that you want to ignore.

A naive way to process this would involve some kind of “process this” flag.

my $process_this = 0;
while (<$file>) {
  $process_this = 1 if /START/;
  $process_this = 0 if /END/;
  process_this_line($_) if $process_this;
}

I’ve often seen code that looks like this. The author of that code didn’t know about the flip-flop operator. The flip-flop operator encapsulates all of that logic for you. Using the flip-flop operator, the previous code becomes this:

while (<$file>) {
  process_this_line($_) if /START/ .. /END/;
}

The flip-flop operator returns false until its left operand (/START/) evaluates as true. It then returns true until its right operand (/END/) evaluates as true. And then the cycle repeats.

The flip-flop operator has one more trick up its sleeve. One common requirement is to only process certain line numbers in a file (perhaps we just want to process lines 20 to 40). If one of the operands is a constant number, then it is compared the the current record number (in $.) and the operand is fired if the line number matches. So processing lines 20 to 40 of a file is a simple as:

while (<$file>) {
  process_this_line($_) if 20 .. 40;
}

Three Dots

It was the three dot operator that triggered the conversation which reminded me to write this article. A three dot operator was added in Perl 5.12. It’s called the “yada-yada” operator. It is used to stand for unimplemented code.

sub reverse_polarity {
  # TODO: Must write this
  ...
}

Programmers have been leaving “TODO” comments in their code for decades. And they’ve been using ellipsis (a.k.a. three dots) to signify unimplemented code for just as long. But now you can actually put three dots in your source code and it will compile and run. So what’s the benefit of this over just leaving a TODO comment? Well, what happens if you call a function that just contains a comment? The function executes and does nothing. You might not realise that you haven’t implemented that function yet. With the yada-yada operator standing in for the unimplemented code, Perl throws an “unimplemented” error and you are reminded that you need to write that code before calling it.

But the yada-yada operator wasn’t the first three dot operator in Perl. There has been another one in the language for a very long time (you might say since the year dot!) And I bet very few of you know what it is.

The original three dot operator is another flip-flop operator. And the difference between it and the two dot version is subtle. It’s all to do with how many tests can be run against the same line. With the two dot version, when the operator flips to true it also checks the right-hand operand in the same iteration – meaning that it can flip from false to true and back to false again as part of the same iteration. If you use the three dot version, then once the operator flips to true, then it won’t check the right-hand operand until the next iteration – meaning that the operator can only flip once per iteration.

When does this matter? Well if our text file sometimes contains empty records that contain START and END on the same line, the difference between the two and three dot flip-flops will determine exactly which lines are processed.

If the two dot version encounters one of these empty records, it flips to true (because it matches /START/) and then flops back to false (because it matches (/END/). However, the flop to false doesn’t happen until after the expression returns a value (which is true). The net effect, therefore is that the line is printed, but the flip-flop is left in the false state and the following lines won’t be printed until one contains a START.

If the three dot version encounters one of these empty records, it also flips to true (because it matches /START/) but then doesn’t check the right-hand operand. So the line gets printed and the flip-flop remains in its true state so the following lines will continue to be printed until one contains an end.

Which is correct? Well, of course, it depends on your requirements. In this case, I expect that the two dot version gives the results that most people would expect. But the three dot version is also provided for the cases when its behaviour is required. As always, Perl gives you the flexibility to do exactly what you want.

But I suspect that the relatively small number of people who seem to know about the three dot flip-flip would indicate that its behaviour isn’t needed very often.

So, there you have them. Perl’s five dot operators – concatenation, range, flip-flop, yada-yada and another flip-flop. I hope that helps you impress people in your next Perl job interview.

Update: dakkar points out that yada-yada isn’t actually an operator. He’s absolutely right, of course, it stands in place of a statement and has no operands. But being slightly loose with our terminology here makes for a more succinct article.

Perl APIs

For a lot of programmers out there, Perl has become largely invisible. They just never come across it. That might seem strange to you as you sit inside the Perl community echo chamber reading the Perl Ironman or p5p, but try this simple experiment.

Think of a web site that you use and that supplies an API. Now go to that API’s documentation and look at the example code. What languages are the examples written in? PHP? Almost certainly. Ruby? Probably. Python? Probably. C#? Quite possibly. Perl? Almost certainly not[1].

Perl has fallen so far off the radar of most people that when web sites write example code for these APIs, they very rarely consider Perl as a language worth including. And because they don’t bother including Perl then any random programmer coming to that API will assume that the API doesn’t support Perl, or (at the very least) that the lack of examples will make using Perl harder than it would be with plenty of examples to copy from.

This then becomes a self-fulfilling prophecy as Github fills with more and more projects using other languages to talk to these APIs. And the chance of anyone who isn’t already a Perl user ever trying to interact with these APIs using Perl falls and falls.

Of course, this is all completely wrong. These APIs are just going to be a series of HTTP requests using REST or XML-RPC (or, if you’re really unlucky, SOAP). Perl has good support for all of that. You might need to use something like OAuth to get access to the API – well Perl does that too.

Of course, in some cases good Perl support exists already – Net::Twitter is a good example. And to be fair to Twitter, their API documentation doesn’t seem to give any examples in specific languages – so Perl isn’t excluded here. But in many other cases, the Perl version languishes unnoticed on CPAN while other languages get mentioned on the API page.

I think that we can try to address this in 2014. And I’d like to ask you to help me. I’ve set up a mailing list called perl-api-squad where we can discuss this. In a nutshell, I think that the plan should be something like this:

  1. Identify useful APIs where there is either no Perl API or no Perl examples
  2. Write CPAN API wrappers where they are missing
  3. Approve API owners and offer them Perl examples to add to their web site

That doesn’t sound too complicated to me. And I think (or, perhaps, hope) that most API owners will be grateful to add more examples of API usage to their site – particularly if it involves next to no effort on their part.

I also expect that the Perl API Squad will produce a web site that lists Perl API support. We might even move towards producing a framework that makes it easy to write a basic Perl wrapper around any new API.

What do you think? Is this a worthwhile project? Who’s interested in joining in?

[1] Yes, I know there are exceptions. But they are just that – exceptions.

Misunderstanding Context

Over the last few days I’ve been involved in a discussion on LinkedIn[1]. It has been interesting as it shows how many people still misunderstand many of the intricacies of context and, in particular, how it ties in with the values returned from subroutines.

The original question asked why these two pieces of code acted differently:

my ($index) = grep {
  $array[$_] == $variable
} 0 .. $#array;
my $index = grep {
  $array[$_] == $variable
} 0 .. $#array;

The first one gives the first index where the element equals $variable, the second one gives the number of indexes where the element equals $variable.

The answer, of course, comes down to context.  And most people who answered seemed to understand that, but many of their explanations were still way off the mark. Here’s the first answer:

grep returns an array – so the first one will return the value of the first match, but 2nd one in scalar context will return the size of the array, the number of matches. I believe.

The first statement here – “grep returns an array” – is wrong, so any explanation built on that fact is going to be fundamentally flawed.

After a couple of similar answers, I jumped in and pointed out that the only way to know how grep will work in different contexts is to read the documentation; which says:

returns the list value consisting of those elements for which the expression evaluated to true. In scalar context, returns the number of times the expression was true.

At that point it got a bit weird. People started telling me all sorts of strange things in order to show that the earlier answers were better than mine. I was told that lists and arrays are the same thing, that there was something called “array context” and that it was possible for function to return arrays.

I think I’ve worked out what most of the misconceptions in the discussion were. Here’s a list.

1. Arrays are not lists

I know this is a very common misunderstanding. I come across it all the time. People use the terms “list” and “array” interchangeably and end up thinking that they are the same thing. They aren’t. A list is a data value and an array is a variable. They act the same way a lot of the time but unless you understand the difference, you will make mistakes.

Mike Friedman wrote a great blog post that explains the difference in considerable detail. But the core difference is this – arrays are persistent (at least while they are in scope) data structures; lists are ephemeral.

On training courses, I tell people that if they understand the difference between an array and a list they’ll be in the top 20% of Perl programmers. In my experience, that’s pretty close to the truth.

2. Subroutines return lists

A subroutine can only ever return a list. Never an array. It can be an empty list. It can be a list with only one item. But it’s always a list.

There’s one small “gotcha” here. I know this bit me a few times in the past. If you read the documentation for return, it says this:

Evaluation of EXPR may be in list, scalar, or void context, depending on how the return value will be used

This means that when you have code like return @array, it’s @array that is evaluated in the context of the subroutine call. So, depending on  the context of the call, this will return either a list consisting of all the elements in @array, or a list with one element which is the number of elements in @array.

3. There is no “array context”

When you confuse lists and arrays, then it’s not surprising that you might also decide that you can also invent a new context called “array context”. There are only list context, scalar context and void context. Ok, actually there are a few more specialised contexts that you can use (see the Want module for details) but most of the time you’ll only be dealing with those three.

Of course, the name of the wantarray function doesn’t help at all here. But it’s worth noting that the documentation for wantarray ends by saying:

This function should have been named wantlist() instead.

4. You  can’t guess contextual behaviour

One common argument I got when pointing this out on LinkedIn ran along the lines of “but grep acts like it returns an array, so it’s a useful mental model – it helps people to understand context”.

The first part of this is true – grep does act like it returns an array. It returns a list (which you might often store in an array) in list context and it returns the length of that list in scalar context. I can see how you would mistake that for returning an array. But  see my point 2 above. Subroutines do not ever return arrays.

But is it a useful mental model? Does it help people understand how subroutines work in different contexts?

No. It helps people to understand how this particular function works. And there are several other Perl functions that work the same way (keys is one example). But there are plenty of other functions  that don’t work that way. The canonical example is localtime. In list context, it returns a list of nine values; in scalar context it returns a single value (which isn’t the number 9).  Another good example is caller. In list context, it returns a list of items (which can contain either three or eleven elements); in scalar context it returns the first item of that list. There are many more examples.

So the problem with a mental model that assumes that functions work as though they return arrays is that it only works for a subset of functions. And you’d need to waste effort remembering which functions it works for and which ones it doesn’t work for. You’d be far better off (in my opinion) realising that there is no rule that works for all functions and just remembering how each function works (or, more practically, remembering to check the documentation whenever you need to know).

 

If you’ve seen me giving a lightning talk at a conference this  year, you’ll know that I’ve been encouraging more people from the Perl community to get involved in discussions in places outside the echo chamber, places like LinkedIn. It’s interesting because you see how people outside of the community see Perl. You see the misunderstandings that they work with and you might see how we can improve the Perl documentation to help them get around those misunderstandings.

This morning, this comment was posted to the LinkedIn discussion that I’ve been talking about:

This stuff should be included in a default perl tutorial to avoid common mistake.

And, you know, I think he’s probably right.

[1] Because of the way LinkedIn works, you won’t be able to see this discussion unless you’re a member of LinkedIn and, probably, a member of the Perl group as well.

Parallel Universe Perl 6

Last night was the monthly London Perl Mongers social meeting. I hadn’t been for far too long, but I went last night and enjoyed myself.

The talk was as varied as it always is, but one conversation in particular got me thinking. We were talking about YAPC Europe and someone asked if I had seen the Future Perl Versioning Panel. I said I had and that I was slightly disappointed with the make-up of the panel. In my opinion having three people on the panel who were all strong advocates for Perl 6 remaining Perl 6 didn’t really lead to much of a discussion.

In the end, though, any discussion on this subject is pretty pointless. Larry’s word is law and he has made it very clear that he wants things to remain the way they are. And, of course, any discussion of what might have happened differently if Perl 6 had been given a different name or any of the other alternatives is all completely hypothetical.

But hypothetical discussions can be fun.

So lets turn the discussion round and look at it from a slightly different angle.

Imagine you’re in an alternative universe. One where Jon Orwant never threw those coffee cups and the Perl 6 project was never announced. But also imagine that Perl 5 development in this universe had proceeded along the same lines as it has in our universe (I know that’s unlikely as a lot of Perl 5 development in the last ten years has come out of people wanting to implement ideas from Perl 6 – but let’s ignore that inconvenient fact).

My question to you, then is this…

In this parallel universe, at which point in the last thirteen years of Perl 5 development should we have changed the major version number to 6?

I have an answer to the question, but I’d like to hear some other opinions before sharing it.

Unicode and Perl

Over the last couple of days I’ve been involved in a couple of discussions where it is clear that other people don’t understand how Perl deals with Unicode. The documentation is clear and detailed (there’s even a good tutorial) but for some reason people still persist in misunderstanding it.

Here’s a quick quiz. Can you explain (in detail) what is going on with all of these four command-line programs? And for bonus points, which one should we be emulating in our code?

$ perl -E'say "£"'
£
$ perl -Mutf8 -E'say "£"'
�
$ perl -C -E'say "£"'
£
$ perl -C -Mutf8 -E'say "£"'
£

In all cases, assume that my locale is set to en_US.UTF-8.

I’ll post explanations in a few days time.

Update: Coincidentally, Miyagawa posted something very similar on his blog.

Just Build Something

The Political Web

About a month ago, JT Smith suggested that we should all stop talking about Perl and just build something. And, purely coincidentally, over the last few weeks I resurrected a project that I have been poking at for about five years and have finally turned it into something that I’m happy to show the world.

The Political Web is a site which aggregates all of the information I can find on the web about individual British MPs. I say “all of the information”, but that’s obviously a bit of a work in progress. But I think that what I already have is useful and interesting – well, for people who are interested in British politics. I have plans to bring in more information in the future.

Although I’ve been working on the site for five years, I pretty much rebuilt it from scratch when I recently returned to it. Actually getting something useful up and running took about four hours. That’s because I was building it using Perl and, more specifically, Dancer.

Pebble and Perl

I’ve been wearing a Pebble watch for a couple of months now. I really like it but, to be honest, it’s the potential that has me most excited. The number of apps currently available is a bit disappointing and the API is taking its time appearing.

But even when the API is published, I wonder if I’ll have the time to learn all the necessary technologies in order to write Pebble-aware apps. All I want is to have some way to send a notification that pops up on the phone. Surely there must be an easy way to achieve that. Preferably one that I can use from Perl.

Of course, it turns out that there is. The secret is an app called Pushover. Pushover is a web service that sends notifications to your Android (or iOS if you’re that way inclined) device. There’s an app that you install on your device (it’s not free – I think it cost me £3.30) and you need to sign up for a free account. Then you can send notifications to your device either from their web site or using their API. The API is a simple HTTP request-based system. There’s an example on the Pushover web site that uses LWP::UserAgent, but you can make it even simpler using WebService::Pushover.

That’s all well and good. We can now send arbitrary notification from to our phone. How do we get from there to the watch?

The standard Pebble Android app is currently a little disappointing. In particular, it only supports pushing notifications from a tiny number of apps from the phone to the watch. But there’s another alternative. There’s an app called Pebble Notifier which will forward notifications from any app on the phone to the watch. When you install Pebble Notifier you can choose which apps you want to forward notifications for.

So, in summary, sign up for a Pushover account and install Pushover and Pebble Notifier on your phone. Then install WebService::Pushover on your computer. Then you can write code like this:

use WebService::Pushover;

my $push = WebService::Pushover->new
or die( "Can't create Pushover interface.\n" );

my $status = $push->message(
message => 'Hello Pebble',
# Signing up to Pushover gives you these tokens
token => 'PUSHOVER API TOKEN',
user => 'PUSHOVER USER TOKEN',
);

And that will send a message to your Pebble.

Now all I need is something useful to do with it.

Update: I’ve just noticed that there’s also an IFTTT channel for Pushover. It has nothing to do with Perl, obviously, but would be an easy way to trigger Pebble notifications for certain triggers.

What New(ish) Perl features Do You Use?

Over on LinkedIn, someone asked me “What core PERL[sic] features do you use regularly that are new since 95?” It’s hard to be sure as the perldelta files only seem to go back to 1997 (for example, when were qw(...), q(...) and qq(...) added?), but here’s a quick list off the top of my head.

  • my was, of course, added in 5.0. But 5.004 added the ability to use it in control expressions – while (my $foo = <>) – and in foreach loops – foreach my $foo (@foos)
  • use VERSION
  • Regex extensions – (?<=RE) and similar. Oh, and qr/.../
  • Data::Dumper (added in 5.005)
  • Unicode support – first added in 5.6.0 and improved in every release since
  • our
  • Three-argument open
  • Omission of intermediate arrows in data structure lookups – $foo[$x][$y] instead of $foo[$x]->[$y]
  • use warnings
  • Memoize
  • Test::More and Test::Simple
  • say
  • defined-or
  • use base (or, more recently, use parent)
  • yada-yada operator

Have I missed anything obvious? What new Perl features do you use most?

Texinfo 5.0 in Perl

There was a story on Slashdot on Sunday saying that the new version of Texinfo had replaced the old, C, implementation of makeinfo with one written in Perl.

I thought it would be interesting to look at the Perl they’ve written. This is, after all, a reasonably large example of Perl code that will be getting a bit of attention in the open source[1] world. If you want to look too, you can download the tarball or browse the CVS repository. In both cases, the Perl code is in the directory called ‘tp’.

This first thing to note is that this is obviously code which has been written by programmers who know their craft. This has not been written by script kiddies. There are, however, some rather bizarre touches which imply that the authors don’t know Perl as well as they might hope.

  • The code is nicely partitioned into modules. And many of the modules are really classes. But some of the modules (the ones in the init directory) have no package statement. So they are more like Perl4-style libraries than what we’d recognise as modules.
  • Many (in fact it might be all) of the subroutines in the modules have prototypes. In many cases, that doesn’t do any harm (although Perl prototypes don’t really address the issues that most people assume they address). But many of these subroutines are called as methods. And prototypes have no effect on method calls at all.
  • There is rather more use of package variables (instead of lexical variables) than I’d be comfortable using in my code.
  • I can see no use of CPAN modules. Perhaps there are no CPAN modules that help with this code. But I’d find that surprising in a project of this size.

Then I started to wonder which version of Perl they were targetting. So I searched for “use 5.xxx” statements. And found quite a range. Many of the files insist on 5.006 and most of the rest want 5.00405 – which I consider a scarily old version of Perl to try and support. There was one file that wanted 5.007_003. Then in one file (Texinfo::Parser) I found this:

# We need the unicode stuff.
use 5.006;

Reading the original Slashdot story, it claimed that one of the improvements in this new version was its Unicode support. And suggesting that you need Perl 5.006 for decent Unicode shows some major misunderstanding. Perl 5.006 was probably the point at which the Perl 5 Porters started to take Unicode seriously. It took until 5.12 or 5.14 before they got it right. Trying to support Unicode properly on anything earlier is almost certainly doomed to failure.

It’s great that another heavily-used project has started to use Perl. But it’s a shame that people might mistake this 5.6-era code for state of the art Perl. I can’t help wondering if this is a symptom of the “Perl 5 can’t have a new version number” problem that I’ve been reading about recently

I’ll get in touch with the Texinfo team and make some suggestions to them, of course.

Update: I emailed the Texinfo mailing list with a link to this blog post. I got a reply from Patrick Dumas who wrote most of this Perl code. His reply is available on the mailing list archives.

[1] Sorry, it’s a GNU project so obviously I mean “free”, not “open source”.

Give Me MetaCPAN

Ever since MetaCPAN launched I’ve been getting increasingly irritated with people who still use links to search.cpan.org. Isn’t it obvious that MetaCPAN is better? Why do people still insist on sharing links to the older site?

Of course they do it for various reasons. Perhaps they aren’t as in touch with the modern Perl world as I am. Perhaps they are wary about changing to use the new shiny toys because they know that a newer shinier one will be along soon. Perhaps I’m reading a web page from five years ago and they can be forgiven for not linking to a site that didn’t exist at the time.

Eventually I realised that there was no point in getting annoyed. I had a computer. Surely I could do something that would fix this problem.

My first idea was to write a GreaseMonkey script. Then I realised that would be hard. MetaCPAN and CPAN have slightly different URL schemes. I’d need to recognise a CPAN URL and convert it to the equivalent MetaCPAN URL. Not an impossible task at all, but not something I could knock up in an hour or so. Especially not in Javascript.

So I asked on the #metacpan IRC channel. Surely I couldn’t be the first person to have this problem. And someone there introduced me to mcpan.org. There’s no point in clicking on that link. You’ll just end up on MetaCPAN. Because that’s what mcpan does. It’s a URL rewriting service. You give it a CPAN URL (with the cpan.org changed to mcpan.org) and it redirects you to the equivalent page on MetaCPAN.

That’s the hard bit of the problem. The bit I didn’t want to write. And someone else has already written it. But they seem to have kept it very quiet. This deserves more publicity. I wish I could remember who wrote it. If you know, please leave a comment.

So we’re now most of the way there. Now if I click on a CPAN link, I can just edit the location bar to add an ‘m’ and I’ll be redirected to the right place. But we can do better than that. I’d like to be automatically redirected. That’s when I discovered the Redirector extension for Firefox. Once it’s installed you can configure it to redirect certain URLs to other ones. I have it configured so that http://search.cpan.org* redirects to http://mcpan.org$1. See how you can use wildcards to match URLs and then use whatever they matched in the replacement URL. It’s a lot like regexes in Perl.

And we’re done. Now whenever I click on an old-style CPAN link, I’m automatically redirected to mcpan, And that, in turn, redirects me to MetaCPAN. And, best of all, I didn’t have to write any code. It was just a case of putting together tools that already existed.

I did all this a few months ago. I meant to write this blog post at the time, but I forgot. I was reminded this morning when Chisel mentioned a GreaseMonkey script he had written to do all of this. See, Chisel isn’t afraid of parsing URLs in Javascript like I was. He just went ahead and did it.

But having alternative solutions to the problem is good, right?

Update: I’ve just been talking about this on the #metacpan IRC channel and it seems that I was rather misunderstanding what was going on here. Here are the details.

Firstly, it was dpetrov who told me about mcpan.org. It’s his domain. I asked him for the code, and he pointed out that there is no code. mcpan.org just redirects everything to a domain called sco.metacpan.org which is where the magic happens. He just got tired of editing search.cpan.org to sco.metacpan.org so he registered another, simpler, domain.

So the actual cleverness happens over on sco.metacpan.org. And that’s really just a list of rewrite rules (the code is on github). I say “just”, but I still wouldn’t want to write them myself.

All this means that mcpan,org is only a convenient tool for when you’re manually editing your location bar. When you’re using the Redirector extension for Firefox you can miss out the middle man and redirect straight to sco.metacpan.org. So I’ve updated my Redirector rule appropriately.