Skip to content


Release-o-clock

I've arrived about a week late to the party, but there are now some shiny new releases of Almanah, Hitori, Totem/totem-pl-parser (with my I'm-Bastien-Nocera hatbeard on) and libfolks for people to use and abuse. (Versions: 0.9.0, 0.3.2, 3.3.92, 3.3.92 and 0.6.8, respectively.) Many thanks to all the people who've done all the work on them this cycle (including all the bug fixing and running around which I was supposed to do; sorry about that).

Why so late? I was busy. University has been keeping my work queue full up for the past few months, and it's not going to ease off until June. Doing all these releases has made me realise what a bad maintainer I am for projects like Almanah. It's used by many people, but has seen few releases recently, and even less development work (basically none of which has been done by me). There's an ongoing plan to modernise its design, and I'm sad that I can't put any time into making this happen.

I'm putting Almanah up for adoption. Wanted: an enthusiastic new maintainer with big ideas for the project's future. It's a reasonably stable project, with a well documented, if a little crusty, code base. There's plenty of scope for new features, and I'll try and guide you as appropriate. Let me know if you're interested! If not, Almanah will continue to limp along as best I can manage.

Posted in GNOME.

Tagged with , , , , .


All aboard the Bendy Bus

Have you ever written a client for a D-Bus service, then had difficulty in testing it because you need to find a way to set up the state of the D-Bus service, run your test, then set up the service in a completely different manner, rinse and repeat…all while running a modified version of the service’s binary in parallel with your system installation of it, and without doing anything which might cause your personal data to be accidentally eaten?

Having done some hacking on Telepathy and EDS clients I can, unfortunately, say “yes” to all of the above. Since problems are problematic, I’ve been hacking on a tool called Bendy Bus, which will hopefully alleviate some of this pain.

Bendy Bus is a project I’ve been working on as my final year university project, but I intend for it to be most useful outside of (hopefully) getting me marks for my degree. The basic idea of Bendy Bus is that you fuel it up with a description of a nondeterministic finite state automaton which represents the D-Bus service you’re using, plus a D-Bus introspection XML file describing all the relevant D-Bus interfaces. Bendy Bus will use the FSM description to simulate the D-Bus service, and run as a wrapper around the client program you’re trying to test. It’ll set up a private dbus-daemon instance for your client program, and expose all the simulated D-Bus objects on this bus.

Bendy Bus will listen for D-Bus method calls and property changes made by your client program, and execute transitions within the FSM as coded in your FSM description file. These transitions may, for example, change the FSM’s state, change data stored in the FSM (technically making it a nondeterministic DFSM, but that’s immaterial), emit D-Bus signals, throw D-Bus errors, etc. Why do I say it’s a nondeterministic FSM? Because you may specify several transitions between the same pair of states which are triggered by the same (for example) D-Bus method call. Bendy Bus will randomly choose one of the transitions to take. For example, if your client program calls a frobnicate : string → string D-Bus method, you could code one transition which successfully replies to the method call with a string return value, and another transition which simulates a failure in the D-Bus service by throwing a D-Bus error instead.

It’s in this fashion that Bendy Bus is actually designed as a fuzzing tool. You can code up a full description of every possible state and transition in your D-Bus service, then set your client program running in the Bendy Bus wrapper, and it’ll randomly explore the service’s state space until a termination condition is met. For example, the client program could crash (in which case a bug’s been found!), a D-Bus activity timeout could be reached (if your client program hasn’t made any D-Bus calls for a few seconds, for example), or a global timeout could be reached. At this point, the test harness can restart your client program and start the whole thing all over again with a different random seed value, causing different execution paths to be explored, and more of your client’s code to be covered.

Of course, Bendy Bus is still young, so features are missing, there are plenty of bugs, and documentation is basically non-existent. A couple of the big features on the list are to implement support for unit testing (which would tone down the fuzz testing aspect of Bendy Bus, and allow deterministic unit tests to be written for D-Bus client programs), implement better error reporting in the machine description parser and better logging during simulation, write a language specification and GtkSourceView highlighter, and write documentation. Help on any of these (except the unit testing stuff, which I have to keep for myself for my university project) would be greatly appreciated.

More than anything, it would be great if people could play with Bendy Bus and see if it’s useful for them (and if not, what could be done to improve it). In the repository at the moment are a couple of example machine description files for Telepathy. They can be used to get a randomly-generated contact list to appear in Empathy, using the following command:

bendy-bus machines/telepathy-cm.machine machines/telepathy-cm.xml \
--test-program-log-file=test-program.log \
--dbus-daemon-log-file=dbus-daemon.log \
--simulator-log-file=simulator.log \
-E FOLKS_DEBUG=telepathy -E EMPATHY_DEBUG=all -E G_MESSAGES_DEBUG=all \
-- empathy

That’s all from me at the moment. The Bendy Bus git repository is on gitorious, and all bug reports should be e-mailed to me: philip {at} tecnocode.co(.)uk.

Posted in GNOME.

Tagged with , , , .


Character encoding and locales

Recently, I've been looking into how character encoding and locales work on Linux, and I thought it might be worthwhile to write down my findings; partly so that I can look them up again later, and partly so that people can correct all the things I've got wrong.

To begin with, let's define some terminology:

  • Character set: a set of symbols which can be used together. This defines the symbols and their semantics, but not how they're encoded in memory. For example: Unicode. (Update: As noted in the comments, the character set doesn't define the appearance of symbols; this is left up to the fonts.)
  • Character encoding: a mapping from a character set to an representation of the characters in memory. For example: UTF-8 is one encoding of the Unicode character set.
  • Nul byte: a single byte which has a value of zero. Typically represented as the C escape sequence ‘\0’.
  • NULL character: the Unicode NULL character (U+0000) in the relevant encoding. In UTF-8, this is just a single nul byte. In UTF-16, however, it's a sequence of two nul bytes.

Now, the problem: if I'm writing a (command line) C program, how do strings get from the command line to the program, and how do strings get from the program to the terminal? More concretely, what actually happens with argv[] and printf()?

Let's consider the input direction first. When the main() function of a C program is called, it's passed an array of pointers to char arrays, i.e. strings. These strings can be arbitrary byte sequences (for example, file names), but are generally intended/assumed to be encoded in the user's environment's character encoding. This is set using the LC_ALL, LC_CTYPE or LANG environment variables. These variables specify the user's locale which (among other things) specifies the character encoding they use.

So the program receives as input a series of strings which are in an arbitrary encoding. This means that all programs have to be able to handle all possible character encodings, right? Wrong. A standard solution to this already exists in the form of libiconv. iconv() will convert between any two character encodings known to the system, so we can use it to convert from the user's environment encoding to, for example, UTF-8. How do we find out the user's environment encoding without parsing environment variables ourselves? We use setlocale() and nl_langinfo().

setlocale() parses the LC_ALL, LC_CTYPE and LANG environment variables (in that order of precedence) to determine the user's locale, and hence their character encoding. It then stores this locale, which will affect the behaviour of various C runtime functions. For example, it will change the formatting of numbers outputted by printf() to use the locale's decimal separator. Just calling setlocale() doesn't have any effect on character encodings, though. It won't, for example, cause printf() to magically convert strings to the user's environment encoding. More on this later.

nl_langinfo() is one function affected by setlocale(). When called with the CODESET type, it will return a string identifying the character encoding set in the user's environment. This can then be passed to iconv_open(), and we can use iconv() to convert strings from argv[] to our internal character encoding (which will typically be UTF-8).

At this point, it's worth noting that most people don't need to care about any of this. If using a library such as GLib – and more specifically, using its GOption command line parsing functionality – all this character encoding conversion is done automatically, and the strings it returns to you are guaranteed to be UTF-8 unless otherwise specified.

So we now have our input converted to UTF-8, our program can go ahead and do whatever processing it likes on it, safe in the knowledge that the character encoding is well defined and, for example, there aren't any unexpected embedded nul bytes in the strings. (This could happen if, for example, the user's environment character encoding was UTF-16; although this is really unlikely and might not even be possible on Linux — but that's a musing for another blog post).

Having processed the input and produced some output (which we'll assume is in UTF-8, for simplicity), many programs would just printf() the output and be done with it. printf() knows about character encodings, right? Wrong. printf() outputs exactly the bytes which are passed to its format parameter (ignoring all the fancy conversion specifier expansion), so this will only work if the program's internal character encoding is equal to the user's environment character encoding, for the characters being outputted. In many cases, the output of programs is just ASCII, so programs get away with just using printf() because most character encodings are supersets of ASCII. In general, however, more work is required to do things properly.

We need to convert from UTF-8 to the user's environment encoding so that what appears in their terminal is correct. We could just use iconv() again, but that would be boring. Instead, we should be able to use gettext(). This means we get translation support as well, which is always good.

gettext() takes in a msgid string and returns a translated version in the user's locale, if possible. Since these translations are done using message catalogues which may be in a completely different character encoding to the user's environment or the program's internal character encoding (UTF-8), gettext() helpfully converts from the message catalogue encoding to the user's environment encoding (the one returned by nl_langinfo(), discussed above). Great!

But what if no translation exists for a given string? gettext() returns the msgid string, unmodified and unconverted. This means that translatable string literals in our program need to magically be written in the user's environment encoding…and we're back to where we were before we introduced gettext(). Bother.

I see three solutions to this:

  • The gettext() solution: declare that all msgid strings should be in US-ASCII, and thus not use any Unicode characters. This works, provided we make the (reasonable) assumption that the user's environment encoding is a superset of ASCII. This requires that if a program wants to use Unicode characters in its translatable strings, it has to provide an en-US message catalogue to translate the American English msgid strings to American English (with Unicode). Not ideal.
  • The gettext()++ solution: declare that all msgid strings should be in UTF-8, and assume that anybody who's running without message catalogues is using UTF-8 as their environment encoding (this is a big assumption). Also not ideal, but a lot less work.
  • The iconv() solution: instruct gettext() to not return any strings in the user's environment encoding, but to return them all in UTF-8 instead (using bind_textdomain_codeset()), and use UTF-8 for the msgid strings. The program can then pass these translated (and untranslated) strings through iconv() as it did with the input, converting from UTF-8 to the user's environment encoding. More effort, but this should work properly.

An additional complication is that of combining translatable printf() format strings with UTF-8 string output from the program. Since printf() isn't encoding-aware, this requires that both the format string and the parameters are in the same encoding (or we get into a horrible mess with output strings which have substrings encoded in different ways). In this case, since our program output is in UTF-8, we definitely want to go with option 3 from above, and have gettext() return all translated messages in UTF-8. This also means we get to use UTF-8 in msgid strings. Unfortunately, it means that we now can't use printf() directly, and instead have to sprintf() to a string, use iconv() to convert that string from UTF-8 to the user's environment encoding, and then printf() it. Whew.

Here's a diagram which hopefully makes some of the journey clearer (click for a bigger version):

Diagram of the processing of strings from input to output in a C program.

So what does this mean for you? As noted above, in most cases it will mean nothing. Libraries such as GLib should take care of all of this for you, and the world will be a lovely place with ponies (U+1F3A0) and cats (U+1F431) everywhere. Still, I wanted to get this clear in my head, and hopefully it's useful to people who can't make use of libraries like GLib (for whatever reason).

Exploring exactly what GLib does is a matter for another time. Similarly, exploring how Windows does things is also best left to a later post (hint: Windows does things completely differently to Linux and other Unices, and I'm not sure it's for the better).

Posted in Tutorials.

Tagged with , , , , , , .


Back from the Desktop Summit

The Desktop Summit's over for another year. Berlin was great, the parties were good (thanks to Intel and Collabora!), and it was good to see everyone again — and also meet some new people. My thanks, as always, go to the GNOME Foundation for arranging and sponsoring my accommodation.

Unfortunately, the Summit wasn't all parties and sightseeing. We managed to belatedly push out folks 0.6, with Travis putting in far too much of his holiday time to tar it all up. Raúl somehow got away with sneaking off and doing interesting things with the Sugar people instead.

My libgdata BoF went down well, with a reasonable amount of interest in pushing libgdata's support for Google Documents forward, for use by both GNOME Documents and Zeitgeist. I'll get round to tidying up and pushing forward with the libgdata 0.12 roadmap in the near future.

I'm looking forward to being the lucky recipient of a load of GSoC patches to review for using GXml in libgdata. I love reviewing patches.

A Coruña next year!

Posted in GNOME.

Tagged with , , , , .


BoF sessions at the Desktop Summit

Just a quick bit of promotion: anyone who's interested in libfolks or libgdata, both as users or developers, should come along to the BoF sessions for them on Tuesday and Wednesday, respectively.

  • libfolks hacking BoF: meeting at 10:00 on Tuesday (2011-08-09) in the lobby in front of Kinosaal. We'll then relocate to a yet-to-be-revealed room with tables and things.
  • libgdata roadmap BoF: 13:00–14:00 on Wednesday (2011-08-10) in room 1.405/1.

Posted in GNOME.

Tagged with , , .


Avatars for Google Contacts in Evolution

Following on from group/category support, I've just merged support for setting and retrieving avatars/photos on Google Contacts from Evolution. There are still a few rough edges, but hopefully I'll get time to tidy those up before GNOME 3.2.

“I'm going to Desktop Summit 2011 in Berlin” bannerI'll be arriving on the 5th, spending a day hurriedly seeing absolutely everything there is to see in Berlin, then diving into the conference on Saturday. I'm leaving on the 12th.

For anybody interested in contacts, meta-contacts, individuals, personas and inventing words to make your software sound more impressive, there's a folks BoF happening at some point on Wednesday (not Thursday as the schedule says, since Travis has to leave beforehand). More details will follow, hopefully, in the libfolks talk on Saturday, 11:20–11:50 in Room 2002.

Posted in GNOME.

Tagged with , , , , .


End of the hackfest

The IM, Contacts & Social hackfest is over and people are variously heading back to their respective corners of the world. A lot of good discussion was had, and I think everyone's got a clearer vision of where we need to be with IMs and contacts and what needs to be done to get there.

Travis, Raúl and I clarified a lot of the changes needed to libfolks for this shiny new future, and while I didn't get as much hacking done as I would've liked, there's plenty of time for that later.

Thanks to Collabora for providing space for the hackfest and sponsoring plenty of food, the GNOME Foundation for enabling the hackfest to happen, and Nyan cat for entertaining Bastien.

Posted in GNOME.

Tagged with , , , .


IM, Contacts & Social hackfest

I'll be heading along to the IM, Contacts & Social hackfest next week at Collabora's offices. There, a plan for world domination by libfolks will be forged, along with plotting around GNOME's new SSO overlord system and work on the much-awaited GNOME Contacts.

Should be fun!

Relatedly, I've just released libgdata 0.9.0, which has sprouted support for OAuth 1.0 — so hopefully some GNOME Online Accounts goodness will soon make it into Evolution's Google Contacts and Calendar backends.

Posted in GNOME.

Tagged with , , , , , , , .


Group/Category support for Google Contacts in Evolution

My work to add support for groups/categories to Evolution's Google Contacts address book backend has just landed, thanks to Milan Crha for reviewing it all. It should be available in GNOME 3.0, so people can categorise their contacts to their hearts' content!

I'm now working on making the backend properly asynchronous and cancellable, and also looking into adding support for contact photos.

Posted in GNOME.

Tagged with , , .


More progress on folks' EDS backend

Raúl (who is unfortunately not on Planet GNOME) continues to plough ahead on the evolution-data-server backend for libfolks. Great work!

Posted in GNOME.

Tagged with , .