Unicode in Python

Now that exams are finally over, I can spend more time on GNOMEy things. One problem which has been sitting on my to-do list for a while is that of translatable Unicode strings in Python. It appears that my patch in bug #591496 to get Hamster to use Unicode em-dashes inadvertently broke translation of the strings. Whoops.

It turns out that in order for gettext to properly match and translate a C-locale string which contains Unicode characters, the encoding of the Python file must be specified using a coding: line at the top of the file, and the string in question must be a Unicode object. For example:

# -*- coding: utf-8 -*-
import gettext
my_translated_string = gettext.gettext(u'My Unicode string…')

I don't think this is too common a problem, and I've checked that it doesn't affect any of the other Python modules I've fiddled with, but hopefully this will be useful to someone. As far as I understand it, all translatable strings in Python modules should be u'Unicode objects rather than normal strings' anyway, ideally, but don't take my word on it because my Python-fu is weak.

One thought on “Unicode in Python

  1. ZeD

    If you prefer, you can stick to plain-ascii source python files using unicode escapes

    in this example, using u'My Unicode string\u2026'

Comments are closed.