Tag Archives: utf8

Unicode and Perl

Over the last couple of days I’ve been involved in a couple of discussions where it is clear that other people don’t understand how Perl deals with Unicode. The documentation is clear and detailed (there’s even a good tutorial) but for some reason people still persist in misunderstanding it.

Here’s a quick quiz. Can you explain (in detail) what is going on with all of these four command-line programs? And for bonus points, which one should we be emulating in our code?

$ perl -E'say "£"'
£
$ perl -Mutf8 -E'say "£"'
�
$ perl -C -E'say "£"'
£
$ perl -C -Mutf8 -E'say "£"'
£

In all cases, assume that my locale is set to en_US.UTF-8.

I’ll post explanations in a few days time.

Update: Coincidentally, Miyagawa posted something very similar on his blog.