Over the last couple of days I’ve been involved in a couple of discussions where it is clear that other people don’t understand how Perl deals with Unicode. The documentation is clear and detailed (there’s even a good tutorial) but for some reason people still persist in misunderstanding it.
Here’s a quick quiz. Can you explain (in detail) what is going on with all of these four command-line programs? And for bonus points, which one should we be emulating in our code?
$ perl -E'say "£"'
$ perl -Mutf8 -E'say "£"'
$ perl -C -E'say "£"'
$ perl -C -Mutf8 -E'say "£"'
In all cases, assume that my locale is set to en_US.UTF-8.
I’ll post explanations in a few days time.
Update: Coincidentally, Miyagawa posted something very similar on his blog.