Stay away from the utf8 mess in Perl

06 May 2010

You are writing a Perl script that uses and manipulates data and/or text files with utf8 characters. Perl gives you sometimes weird error messages, funny characters show up in your output, and by searching out there you only get confusing and contradictory advice.

If you think that everything should be utf8 and you don’t care about anybody else using any other kind of encoding, simply include the following pragmas:

use open qw/:std :utf8/; # all your input/output is utf8
use utf8;                # your program file is utf8

For most cases this should be enough, and you should be able to forget about this stupid encoding business.

Anyway, if somehow you manage to get hold of a $string that you know is encoded as utf8 but perl doesn’t seem to realize this, then you should call

utf8::decode($string);

so that perl stores the string in its internal utf8 representation.