Monthly Archives: October 2009

Unicode in GNOME

This is something I’ve been meaning to write about for a while and, I must admit, something I should have written about before I started pushing through changes in GNOME applications. I’m talking about the use of Unicode in GNOME: the use of the proper ellipsis character (“…”), proper en- and em-dashes (“–” and “—”, respectively) and fancy quotation marks.

This is something which has been brought up before, so I’ll try not to reignite the old arguments, and instead concentrate on the unresolved issues. Here are the main points:

  • Proper Unicode characters look nicer than the ASCII versions which substitute for them. The ellipsis is correctly spaced (if one were to use full stops instead, they should technically have non-breaking spaces between them), and the quotation marks are pleasantly curved. This looks nicer, to my eye at least. The difference between en- and em-dashes and the ASCII hyphens used to simulate them is considerable.
  • They’re harder to type on a conventional keyboard, though are easily accessible through the use of the compose key.
  • There are questions about the level of font support for such characters. On my Fedora 11 system, all the fonts except one (“PakTypeTehreer”) have the expected characters (ellipsis, dashes and quotation marks) at the right codepoints, although many of the glyphs are ugly and unloved (e.g. in Hershey and Khmer). DejaVu and Bitstream have excellent support for these characters. There is a suggestion that Pango should be extended to support decomposing the Unicode characters into their ASCII equivalents if a font doesn't support them.
  • There was confusion over what exactly was allowed in source code, and whether UTF-8 characters were allowed in C-locale strings (regardless of their representation in source code). It was decided that they were, but that the most portable way to represent them in C was to use octal slash escaping (e.g. “\342\200\246” instead of “…”). We’ve had Unicode characters in source code since GNOME 2.22, and (apparently) there have been no bug reports on the matter, but there was no conclusive answer about how embedded C compilers (and other, less well-known compilers) cope with such things.

Obviously, I’m thoroughly in the pro-Unicode camp. I believe it would make our desktop look more professional, and improve legibility of the interface in places. I’ve spoken to Calum Benson of HIG fame and he has no particular objections to mandating use of the appropriate Unicode characters by the HIG.

In the meantime, I’ve been filing bugs against applications to convert them to using proper Unicode characters; this probably wasn’t the best way to go about things, but at least it is a move in the right direction (in my view anyway). Unfortunately, this has come at the cost of inconsistency in the desktop. Most of the changes have been applied after branching for gnome-2-28, however, so if we can work out some guidelines about use of Unicode characters early in the 2.30 cycle (i.e. now), consistency could be maintained in the desktop for the 2.30 release. We might even be able to brag about nice typography for (dare I say it?) GNOME 3.0!

So, should we be expending effort on dealing with fonts which don’t support various Unicode characters, extending Pango to support the appropriate decompositions? Are there any problems with embedded C compilers and Unicode string literals? If we decide to go with a uniform usage of certain Unicode characters, what guidelines shall we go with, and how can we educate translators in how to type them?

Sources: