Monthly Archives: June 2010

Unicode in Python

Now that exams are finally over, I can spend more time on GNOMEy things. One problem which has been sitting on my to-do list for a while is that of translatable Unicode strings in Python. It appears that my patch in bug #591496 to get Hamster to use Unicode em-dashes inadvertently broke translation of the strings. Whoops.

It turns out that in order for gettext to properly match and translate a C-locale string which contains Unicode characters, the encoding of the Python file must be specified using a coding: line at the top of the file, and the string in question must be a Unicode object. For example:

# -*- coding: utf-8 -*-
…
import gettext
gettext.textdomain('myapp')
…
my_translated_string = gettext.gettext(u'My Unicode string…')
…

I don't think this is too common a problem, and I've checked that it doesn't affect any of the other Python modules I've fiddled with, but hopefully this will be useful to someone. As far as I understand it, all translatable strings in Python modules should be u'Unicode objects rather than normal strings' anyway, ideally, but don't take my word on it because my Python-fu is weak.