Category Archives: Tutorials

How-tos, tutorials and guides (mainly on how to code for the web).

Thoughts about reviewing large patchsets

I have recently been involved in reviewing some large feature patchsets for a project at work, and thought it might be interesting to discuss some of the principles we have been trying to stick to when going about these reviews.

These are just suggestions which will not apply verbatim to every project, but they might provoke some ideas which are relevant to your project. In many cases, a developer might find a checklist is not useful for them — it is too rigid, or slows down the pace of development too much. Checklists are probably best treated as strong suggestions, rather than strict requirements for a review to pass; developers need to use them as reminders for things they should think about when submitting a patch, rather than a box-ticking exercise. Checklists probably work better in larger projects with lots of infrequent or inexperienced contributors, who may not be aware of all of the conventions of the project.

Overall principles

In order to break a set of reviews down into chunks, there a few key principles to stick to:

  • Review patches which are as small as possible, but no smaller (see here, here and here)
  • Learn from each review so the same review comments do not need to be made more than once
  • Use automated tools to eliminate many of the repetitive and time consuming parts of patch review and rework
  • Do high-level API review first, ideally before implementing those APIs, to avoid unnecessary rework

Pre-submission checklist

(A rationale for each of these points is given in the section below to avoid cluttering this one.)

This is the pre-submission checklist we have been using to check patches against before submission for review:

  1. All new code follows the coding guidelines, especially the namespacing guidelines, memory management guidelines, pre- and post-condition guidelines, and introspection guidelines — some key points from these are pulled out below, but these are not the only points to pay attention to.
  2. All new public API must be namespaced correctly.
  3. All new public API must have a complete and useful documentation comment.
  4. All new public API documentation comments must have GObject Introspection annotations where appropriate; g-ir-scanner (part of the build process) must emit no warnings when run with --warn-all --warn-error (which should be set by $(WARN_SCANNERFLAGS) from AX_COMPILER_FLAGS).
  5. All new public methods must have pre- and post-conditions to enforce constraints on the accepted parameter values.
  6. The code must compile without warnings, after ensuring that AX_COMPILER_FLAGS is used and enabled in configure.ac (if it is correctly enabled, compiling the module should fail if there are any compiler warnings) — remember to add $(WARN_CFLAGS), $(WARN_LDFLAGS) and $(WARN_SCANNERFLAGS) to new Makefile.am targets as appropriate.
  7. The introduction documentation comment for each new object must give a usage example for each of the main ways that object is intended to be used.
  8. All new code must be formatted as per the project’s coding guidelines.
  9. If possible, there should be an example program for each new feature or object, which can be used to manually test that functionality — these examples may be submitted in a separate patch from the object implementation, but must be submitted at the same time as the implementation in order to allow review in parallel. Example programs must be usable when installed or uninstalled, so they can be used during development and on production machines.
  10. There must be automated tests (using the GTest framework in GLib) for construction of each new object, and for getting and setting each of its properties.
  11. The code coverage of the automated tests must be checked (using make check-code-coverage and AX_CODE_COVERAGE) before submission, and if it’s possible to add more automated tests (and for them to be reliable) to improve the coverage, this should be done; the final code coverage figure for the object should be mentioned in a comment on the patch review, and it would be helpful to have the lcov reports for the object saved somewhere for analysis as part of the review.
  12. There must be no definite memory leaks reported by Valgrind when running the automated tests under it (using AX_VALGRIND_CHECK and make check-valgrind).
  13. All automated tests must be installed as installed-tests so they can be run as integration tests on a production system.
  14. make distcheck must pass before submission of any patch, especially if it touches the build system.
  15. All new code has been checked to ensure it doesn’t contradict review comments from previous reviews of other patches (i.e. we want to avoid making the same review comments on every submitted patch).
  16. Commit messages must explain why they make the changes they do.

Rationales

  1. Each coding guideline has its own rationale for why it’s useful, and many of them significantly affect the structure of a patch, so are important to get right early on.
  2. Namespacing is important for the correct functioning of a lot of the developer tools (for example, GObject Introspection), and to avoid symbol collisions between libraries — checking it is a very mechanical process which it is best to not have to spend review time on.
  3. Documentation comments are useful to both the reviewer and to end users of the API — for the reviewer, they act as an explanation of why a particular API is necessary, how it is meant to be used, and can provide insight into implementation choices. These are questions which the reviewer would otherwise have to ask in the review, so writing them up lucidly in a documentation comment saves time in the long run.
  4. GObject Introspection annotations are a requirement for the platform’s language bindings (to JavaScript or Python, for example) to work, so must be added at some point. Fixing the error messages from g-ir-scanner is sufficient to ensure that the API can be introspected.
  5. Pre- and post-conditions are a form of assertion in the code, which check for programmer errors at runtime. If they are used consistently throughout the code on every API entry point, they can catch programmer errors much nearer their origin than otherwise, which speeds up debugging both during development of the library, and when end users are using the public APIs. They also act as a loose form of documentation of what each API will allow as its inputs and outputs, which helps review (see the comments about documentation above).
  6. The set of compiler warnings enabled by AX_COMPILER_FLAGS have been chosen to balance false positives against false negatives in detecting bugs in the code. Each compiler warning typically identifies a single bug in the code which would otherwise have to be fixed later in the life of the library — fixing bugs later is always more expensive in terms of debugging time.
  7. Usage examples are another form of documentation (as discussed above), which specifically make it clearer to a reviewer how a particular feature is intended to be used. In writing usage examples, the author of a patch can often notice awkwardnesses in their API design, which can then be fixed before review — this is faster than them being caught in review and sent back for modification.
  8. Well formatted code is a lot easier to read and review than poorly formatted code. It allows the reviewer to think about the function of the code they are reviewing, rather than (for example) which function call a given argument actually applies to, or which block of code a statement is actually part of.
  9. Example programs are a loose form of testing, and also act as usage examples and documentation for the feature (see above). They provide an easy way for the reviewer to test a feature, especially if it affects a UI or has some interactive element; this is very hard to do by simply looking at the code in a patch. Their biggest benefit will be when the feature is modified in future — the example programs can be used to test changes to the feature and ensure that its behaviour changes (or does not) as expected.
  10. For each unit test for a piece of code, the behaviour checked by that unit test can be guaranteed to be unchanged across modifications to the code in future. This prevents regressions (especially if the unit tests for the project are set up to be run automatically on each commit by a continuous integration system). The value of unit tests when initially implementing a feature is in the way they guide API design to be testable in the first place. It is often the case that an API will be written without unit tests, and later someone will try to add unit tests and find that the API is untestable; typically because it relies on internal state which the test harness cannot affect. By that point, the API is stable and cannot be changed to allow testing.
  11. Looking at code coverage reports is a good way to check that unit tests are actually checking what they are expected to check about the code. Code coverage provides a simple, coarse-grained metric of code quality — the quality of untested code is unknown.
  12. Every memory leak is a bug, and hence needs to be fixed at some point. Checking for memory leaks in a code review is a very mechanical, time-consuming process. If memory leaks can be detected automatically, by using valgrind on the unit tests, this reduces the amount of time needed to catch them during review. This is an area where higher code coverage provides immediate benefits. Another way to avoid leaks is to use g_autoptr() to automatically free memory when leaving a control block.
  13. If all automated tests are available, they can be run as part of system-wide integration tests, to check that the project behaviour doesn’t change when other system libraries (its dependencies) are changed. This is one of the motivations behind installed-tests. This is a one-time setup needed for your project, and once it’s set up, does not need to be done for each commit.
  14. make distcheck ensures that a tarball can be created successfully from the code, which entails building it, running all the unit tests, and checking that examples compile.
  15. If each patch is updated to learn from the results of previous patch reviews, the amount of time spent making and explaining repeated patch review comments should be significantly reduced, which saves everyone’s time.
  16. Commit messages are a form of documentation of the changes being made to a project. They should explain the motivation behind the changes, and clarify any design decisions which the author thinks the reviewer might question. If a commit message is inadequate, the reviewer is going to ask questions in the review which could have been avoided otherwise.

A detailed look at GSource

Another post in this sporadic mini-series on GMainContext, this time looking at GSource and writing your own type of event source. This post is actually adapted from a page I wrote about it for developer.gnome.org, which includes fully-compilable and unit tested source code;  future tweaks and clarifications can be found there. If you find a problem, please file a bug.

tl;dr: Write a custom GSource if you have a non-file-descriptor-based event source to integrate with a GMainContext. It’s a matter of writing a few virtual functions.

What is GSource?

A GSource is an expected event with an associated callback function which will be invoked when that event is received. An event could be a timeout or data being received on a socket, for example.

GLib contains various types of GSource, but also allows applications to define their own, allowing custom events to be integrated into the main loop.

The structure of a GSource and its virtual functions are documented in detail in the GLib API reference.

A message queue source

As a running example, a message queue source will be used which dispatches its callback whenever a message is enqueued to a queue internal to the source (potentially from another thread).

This type of source is useful for efficiently transferring large numbers of messages between main contexts. The alternative is transferring each message as a separate idle GSource using g_source_attach(). For large numbers of messages, this means a lot of allocations and frees of GSources.

Structure

Firstly, a structure for the source needs to be declared. This must contain a GSource as its parent, followed by the private fields for the source: the queue and a function to call to free each message once finished with.

typedef struct {
  GSource         parent;
  GAsyncQueue    *queue;  /* owned */
  GDestroyNotify  destroy_message;
} MessageQueueSource

Prepare function

Next, the prepare function for the source must be defined. This determines whether the source is ready to be dispatched. As this source is using an in-memory queue, this can be determined by checking the queue’s length: if there are elements in the queue, the source can be dispatched to handle them.

return (g_async_queue_length (message_queue_source->queue) > 0);

Check function

As this source has no file descriptors, the prepare and check functions essentially have the same job, so a check function is not needed. Setting the field to NULL in GSourceFuncs bypasses the check function for this source type.

Dispatch function

For this source, the dispatch function is where the complexity lies. It needs to dequeue a message from the queue, then pass that message to the GSource’s callback function. No messages may be queued: even through the prepare function returned true, another source wrapping the same queue may have been dispatched in the mean time and taken the final message from the queue. Further, if no callback has been set for the GSource (which is allowed), the message must be destroyed and silently dropped.

If both a message and callback are set, the callback can be invoked on the message and its return value propagated as the return value of the dispatch function. This is FALSE to destroy the GSource and TRUE to keep it alive, just as for GSourceFunc — these semantics are the same for all dispatch function implementations.

/* Pop a message off the queue. */
message = g_async_queue_try_pop (message_queue_source->queue);

/* If there was no message, bail. */
if (message == NULL)
  {
    /* Keep the source around to handle the next message. */
    return TRUE;
  }

/* @func may be %NULL if no callback was specified.
 * If so, drop the message. */
if (func == NULL)
  {
    if (message_queue_source->destroy_message != NULL)
      {
        message_queue_source->destroy_message (message);
      }

    /* Keep the source around to consume the next message. */
    return TRUE;
  }

return func (message, user_data);

Callback functions

The callback from a GSource does not have to have type GSourceFunc. It can be whatever function type is called in the source’s dispatch function, as long as that type is sufficiently documented.

Normally, g_source_set_callback() is used to set the callback function for a source instance. With its GDestroyNotify, a strong reference can be held to keep an object alive while the source is still alive:

g_source_set_callback (source, callback_func,
                       g_object_ref (object_to_strong_ref),
                       (GDestroyNotify) g_object_unref);

However, GSource has a layer of indirection for retrieving this callback, exposed as g_source_set_callback_indirect(). This allows GObject to set a GClosure as the callback for a source, which allows for sources which are automatically destroyed when an object is finalized — a weak reference, in contrast to the strong reference above:

g_source_set_closure (source,
                      g_cclosure_new_object (callback_func,
                                             object_to_weak_ref));

It also allows for a generic, closure-based ‘dummy’ callback, which can be used when a source needs to exist but no action needs to be performed in its callback:

g_source_set_dummy_callback (source);

Constructor

Finally, the GSourceFuncs definition of the GSource can be written, alongside a construction function. It is typical practice to expose new source types simply as GSources, not as the subtype structure; so the constructor returns a GSource*.

The example constructor here also demonstrates use of a child source to support cancellation conveniently. If the GCancellable is cancelled, the application’s callback will be dispatched and can check for cancellation. (The application code will need to make a pointer to the GCancellable available to its callback, as a field of the callback’s user data set in g_source_set_callback()).

GSource *
message_queue_source_new (GAsyncQueue    *queue,
                          GDestroyNotify  destroy_message,
                          GCancellable   *cancellable)
{
  GSource *source;  /* alias of @message_queue_source */
  MessageQueueSource *message_queue_source;  /* alias of @source */

  g_return_val_if_fail (queue != NULL, NULL);
  g_return_val_if_fail (cancellable == NULL ||
                        G_IS_CANCELLABLE (cancellable), NULL);

  source = g_source_new (&message_queue_source_funcs,
                         sizeof (MessageQueueSource));
  message_queue_source = (MessageQueueSource *) source;

  /* The caller can overwrite this name with something more useful later. */
  g_source_set_name (source, "MessageQueueSource");

  message_queue_source->queue = g_async_queue_ref (queue);
  message_queue_source->destroy_message = destroy_message;

  /* Add a cancellable source. */
  if (cancellable != NULL)
    {
      GSource *cancellable_source;

      cancellable_source = g_cancellable_source_new (cancellable);
      g_source_set_dummy_callback (cancellable_source);
      g_source_add_child_source (source, cancellable_source);
      g_source_unref (cancellable_source);
    }

  return source;
}

Complete example

The complete source code is available in gnome-devel-docs’ git repository, along with unit tests.

Further examples

Sources can be more complex than the example given above. In libnice, a custom GSource is needed to poll a set of sockets which changes dynamically. The implementation is given as ComponentSource in component.c and demonstrates a more complex use of the prepare function.

Another example is a custom source to interface GnuTLS with GLib in its GTlsConnection implementation. GTlsConnectionGnutlsSource synchronizes the main thread and a TLS worker thread which performs the blocking TLS operations.

Ensuring functions are called in the right context

This article has been tweaked and upstreamed to developer.gnome.org. The original is kept below, but future updates will be made there. If you find a problem, please file a bug.

Continuing in this fledgling series of examining GLib’s GMainContext, this post looks at ensuring that functions are called in the right main context when programming with multiple threads.

tl;dr: Use g_main_context_invoke_full() or GTask. See the end of the post for some guidelines about multi-threaded programming using GLib and main contexts.

To begin with, what is ‘the right context’? Taking a multi-threaded GLib program, let’s assume that each thread has a single GMainContext running in a main loop — this is the thread default main context.((Why use main contexts? A main context effectively provides a work or message queue for a thread — something which the thread can periodically check to determine if there is work pending from another thread. It’s not possible to pre-empt a thread’s execution without using hideous POSIX signalling). I’m ignoring the case of non-default contexts, but their use is similar.)) So ‘the right context’ is the one in the thread you want a function to execute in. For example, if I’m doing a long and CPU-intensive computation I will want to schedule this in a background thread so that it doesn’t block UI updates from the main thread. The results from this computation, however, might need to be displayed in the UI, so some UI update function has to be called in the main thread once the computation’s complete. Furthermore, if I can limit a function to being executed in a single thread, it becomes easy to eliminate the need for locking a lot of the data it accesses((Assuming that other threads are implemented similarly and hence most data is accessed by a single thread, with threads communicating by message passing, allowing each thread to update its data at its leisure.)), which makes multi-threaded programming a whole lot simpler.

For some functions, I might not care which context they’re executed in, perhaps because they’re asynchronous and hence do not block the context. However, it still pays to be explicit about which context is used, since those functions may emit signals or invoke callbacks, and for reasons of thread safety it’s necessary to know which threads those signal handlers or callbacks are going to be invoked in. For example, the progress callback in g_file_copy_async() is documented as being called in the thread default main context at the time of the initial call.

The core principle of invoking a function in a specific context is simple, and I’ll walk through it as an example before demonstrating the convenience methods which should actually be used in practice. A GSource has to be added to the specified GMainContext, which will invoke the function when it’s dispatched. This GSource should almost always be an idle source created with g_idle_source_new(), but this doesn’t have to be the case. It could be a timeout source so that the function is executed after a delay, for example.

As described previously, this GSource will be added to the specified GMainContext and dispatched as soon as it’s ready((In the case of an idle source, this will be as soon as all sources at a higher priority have been dispatched — this can be tweaked using the idle source’s priority parameter with g_source_set_priority(). I’m assuming the specified GMainContext is being run in a GMainLoop all the time, which should be the case for the default context in a thread.)), calling the function on the thread’s stack. The source will typically then be destroyed so the function is only executed once (though again, this doesn’t have to be the case).

Data can be passed between threads in this manner in the form of the user_data passed to the GSource’s callback. This is set on the source using g_source_set_callback(), along with the callback function to invoke. Only a single pointer is provided, so if multiple bits of data need passing, they must be packaged up in a custom structure first.

Here’s an example. Note that this is to demonstrate the underlying principles, and there are convenience methods explained below which make this simpler.

/* Main function for the background thread, thread1. */
static gpointer
thread1_main (gpointer user_data)
{
	GMainContext *thread1_main_context = user_data;
	GMainLoop *main_loop;

	/* Set up the thread’s context and run it forever. */
	g_main_context_push_thread_default (thread1_main_context);

	main_loop = g_main_loop_new (thread1_main_context, FALSE);
	g_main_loop_run (main_loop);
	g_main_loop_unref (main_loop);

	g_main_context_pop_thread_default (thread1_main_context);
	g_main_context_unref (thread1_main_context);

	return NULL;
}

/* A data closure structure to carry multiple variables between
 * threads. */
typedef struct {
	gchar *some_string;  /* owned */
	guint some_int;
	GObject *some_object;  /* owned */
} MyFuncData;

static void
my_func_data_free (MyFuncData *data)
{
	g_free (data->some_string);
	g_clear_object (&data->some_object);
	g_slice_free (MyFuncData, data);
}

static void
my_func (const gchar *some_string, guint some_int,
         GObject *some_object)
{
	/* Do something long and CPU intensive! */
}

/* Convert an idle callback into a call to my_func(). */
static gboolean
my_func_idle (gpointer user_data)
{
	MyFuncData *data = user_data;

	my_func (data->some_string, data->some_int, data->some_object);

	return G_SOURCE_REMOVE;
}

/* Function to be called in the main thread to schedule a call to
 * my_func() in thread1, passing the given parameters along. */
static void
invoke_my_func (GMainContext *thread1_main_context,
                const gchar *some_string, guint some_int,
                GObject *some_object)
{
	GSource *idle_source;
	MyFuncData *data;

	/* Create a data closure to pass all the desired variables
	 * between threads. */
	data = g_slice_new0 (MyFuncData);
	data->some_string = g_strdup (some_string);
	data->some_int = some_int;
	data->some_object = g_object_ref (some_object);

	/* Create a new idle source, set my_func() as the callback with
	 * some data to be passed between threads, bump up the priority
	 * and schedule it by attaching it to thread1’s context. */
	idle_source = g_idle_source_new ();
	g_source_set_callback (idle_source, my_func_idle, data,
	                       (GDestroyNotify) my_func_data_free);
	g_source_set_priority (idle_source, G_PRIORITY_DEFAULT);
	g_source_attach (idle_source, thread1_main_context);
	g_source_unref (idle_source);
}

/* Main function for the main thread. */
static void
main (void)
{
	GThread *thread1;
	GMainContext *thread1_main_context;

	/* Spawn a background thread and pass it a reference to its
	 * GMainContext. Retain a reference for use in this thread
	 * too. */
	thread1_main_context = g_main_context_new ();
	g_thread_new ("thread1", thread1_main,
	              g_main_context_ref (thread1_main_context));

	/* Maybe set up your UI here, for example. */

	/* Invoke my_func() in the other thread. */
	invoke_my_func (thread1_main_context,
	                "some data which needs passing between threads",
	                123456, some_object);

	/* Continue doing other work. */
}

That’s a lot of code, and it doesn’t look fun. There are several points of note here:

  • This invocation is uni-directional: it calls my_func() in thread1, but there’s no way to get a return value back to the main thread. To do that, the same principle needs to be used again, invoking a callback function in the main thread. It’s a straightforward extension which isn’t covered here.
  • Thread safety: This is a vast topic, but the key principle is that data which is potentially accessed by multiple threads must have mutual exclusion enforced on those accesses using a mutex. What data is potentially accessed by multiple threads here? thread1_main_context, which is passed in the fork call to thread1_main; and some_object, a reference to which is passed in the data closure. Critically, GLib guarantees that GMainContext is thread safe, so sharing thread1_main_context between threads is fine. The other code in this example must ensure that some_object is thread safe too, but that’s a topic for another blog post. Note that some_string and some_int cannot be accessed from both threads, because copies of them are passed to thread1, rather than the originals. This is a standard technique for making cross-thread calls thread safe without requiring locking. It also avoids the problem of synchronising freeing some_string. Similarly, a reference to some_object is transferred to thread1, which works around the issue of synchronising destruction of the object.
  • Specificity: g_idle_source_new() was used rather than the simpler g_idle_add() so that the GMainContext the GSource is attached to could be specified.

With those principles and mechanisms in mind, let’s take a look at a convenience method which makes this a whole lot easier: g_main_context_invoke_full().((Why not g_main_context_invoke()? It doesn’t allow a GDestroyNotify function for the user data to be specified, limiting its use in the common case of passing data between threads.)) As stated in its documentation, it invokes a callback so that the specified GMainContext is owned during the invocation. In almost all cases, the context being owned is equivalent to it being run, and hence the function must be being invoked in the thread for which the specified context is the thread default.

Modifying the earlier example, the invoke_my_func() function can be replaced by the following:

static void
invoke_my_func (GMainContext *thread1_main_context,
                const gchar *some_string, guint some_int,
                GObject *some_object)
{
	MyFuncData *data;

	/* Create a data closure to pass all the desired variables
	 * between threads. */
	data = g_slice_new0 (MyFuncData);
	data->some_string = g_strdup (some_string);
	data->some_int = some_int;
	data->some_object = g_object_ref (some_object);

	/* Invoke the function. */
	g_main_context_invoke_full (thread1_main_context,
	                            G_PRIORITY_DEFAULT, my_func_idle,
	                            data,
	                            (GDestroyNotify) my_func_data_free);
}

That’s a bit simpler. Let’s consider what happens if invoke_my_func() were to be called from thread1, rather than from the main thread. With the original implementation, the idle source would be added to thread1’s context and dispatched on the context’s next iteration (assuming no pending dispatches with higher priorities). With the improved implementation, g_main_context_invoke_full() will notice that the specified context is already owned by the thread (or can be acquired by it), and will call my_func_idle() directly, rather than attaching a source to the context and delaying the invocation to the next context iteration. This subtle behaviour difference doesn’t matter in most cases, but is worth bearing in mind since it can affect blocking behaviour (i.e. invoke_my_func() would go from taking negligible time, to taking the same amount of time as my_func() before returning).

How can I be sure a function is always executed in the thread I expect? Since I’m now thinking about which thread each function could be called in, it would be useful to document this in the form of an assertion:

g_assert (g_main_context_is_owner (expected_main_context));

If that’s put at the top of each function, any assertion failure will highlight a case where a function has been called directly from the wrong thread. This technique was invaluable to me recently when writing code which used upwards of four threads with function invocations between all of them. It’s a whole lot easier to put the assertions in when initially writing the code than it is to debug the race conditions which easily result from a function being called in the wrong thread.

This can also be applied to signal emissions and callbacks. As well as documenting which contexts a signal or callback will be emitted in, assertions can be added to ensure that this is always the case. For example, instead of using the following when emitting a signal:

guint param1;  /* arbitrary example parameters */
gchar *param2;
guint retval = 0;

g_signal_emit_by_name (my_object, "some-signal",
                       param1, param2, &retval);

it would be better to use the following:

static guint
emit_some_signal (GObject *my_object, guint param1,
                  const gchar *param2)
{
	guint retval = 0;

	g_assert (g_main_context_is_owner (expected_main_context));

	g_signal_emit_by_name (my_object, "some-signal",
	                       param1, param2, &retval);

	return retval;
}

As well as asserting emission happens in the right context, this improves type safety. Bonus! Note that signal emission via g_signal_emit() is synchronous, and doesn’t involve a main context at all. As signals are a more advanced version of callbacks, this approach can be applied to those as well.

Before finishing, it’s worth mentioning GTask. This provides a slightly different approach to invoking functions in other threads, which is more suited to the case where you want your function to be executed in some background thread, but don’t care exactly which one. GTask will take a data closure, a function to execute, and provide ways to return the result from this function; and will then handle everything necessary to run that function in a thread belonging to some thread pool internal to GLib. Although, by combining g_main_context_invoke_full() and GTask, it should be possible to run a task in a specific context and effortlessly return its result to the current context:

/* This will be invoked in thread1. */
static gboolean
my_func_idle (gpointer user_data)
{
	GTask *task = G_TASK (user_data);
	MyFuncData *data;
	gboolean retval;

	/* Call my_func() and propagate its returned boolean to
	 * the main thread. */
	data = g_task_get_task_data (task);
	retval = my_func (data->some_string, data->some_int,
	                  data->some_object);
	g_task_return_boolean (task, retval);

	return G_SOURCE_REMOVE;
}

/* Whichever thread is invoked in, the @callback will be invoked in
 * once my_func() has finished and returned a result. */
static void
invoke_my_func_with_result (GMainContext *thread1_main_context,
                            const gchar *some_string, guint some_int,
                            GObject *some_object,
                            GAsyncReadyCallback callback,
                            gpointer user_data)
{
	MyFuncData *data;

	/* Create a data closure to pass all the desired variables
	 * between threads. */
	data = g_slice_new0 (MyFuncData);
	data->some_string = g_strdup (some_string);
	data->some_int = some_int;
	data->some_object = g_object_ref (some_object);

	/* Create a GTask to handle returning the result to the current
	 * thread default main context. */
	task = g_task_new (NULL, NULL, callback, user_data);
	g_task_set_task_data (task, data,
	                      (GDestroyNotify) my_func_data_free);

	/* Invoke the function. */
	g_main_context_invoke_full (thread1_main_context,
	                            G_PRIORITY_DEFAULT, my_func_idle,
	                            task,
	                            (GDestroyNotify) g_object_unref);
}

So in summary:

  • Use g_main_context_invoke_full() to invoke functions in other threads, under the assumption that every thread has a thread default main context which runs throughout the lifetime of that thread.
  • Use GTask if you only want to run a function in the background and don’t care about the specifics of which thread is used.
  • In any case, liberally use assertions to check which context is executing a function, and do this right from the start of a project.
  • Explicitly document contexts a function is expected to be called in, a callback will be invoked in, or a signal will be emitted in.
  • Beware of g_idle_add() and similar functions which use the global default main context.

(const gchar*) vs. (gchar*) and other memory management stories

This article now been upstreamed to the GNOME Programming Guidelines, where future updates will be made to it.

Memory management in C is often seen as a minefield, and while it’s not as simple as in more modern languages, if a few conventions are followed it doesn’t have to be hard at all. Here are some pointers about heap memory management in C using GObject and the usual GNOME types and coding conventions, including GObject introspection. A basic knowledge of C is assumed, but no more. Stack memory allocation and other obscure patterns aren’t covered, since they’re much less commonly used in a typical GObject application.

The most important rule when thinking about memory management is: which code has ownership of this memory?

Ownership can be defined as the responsibility to free or deallocate a piece of memory. If code which does own some memory doesn’t deallocate that memory, that’s a memory leak. If code which doesn’t own some memory deallocates it (in addition to the owner doing so), that’s a double-free. Both are bad.

For example, if a function returns a newly allocated string and doesn’t retain a pointer to it, ownership of the string is transferred to the caller. Note that when thinking about ownership transfer, malloc()/free() and reference counting are treated the same: in the former case, a newly allocated piece of heap memory is being transferred; in the latter, a newly incremented reference.

gchar *transfer_full_function (void) {
	return g_strdup ("Newly allocated string.");
}

/* Ownership is transferred here. */
gchar *my_string = transfer_full_function ();

Conversely, if the function does retain a pointer (e.g. inside an object), ownership is not transferred to the caller.

const gchar *transfer_none_function (void) {
	return "Static string.";
}

const gchar *transfer_none_method (MyObject *self) {
	gchar *new_string = g_strdup ("Newly allocated string.");
	/* Ownership is retained by the object here. */
	self->priv->saved_string = new_string;
	return new_string;
}

/* In both of these cases, ownership is not transferred. */
const gchar *my_string1 = transfer_none_function ();
const gchar *my_string2 = transfer_none_method (some_object);

In all of these examples, you may notice the return type of the functions reflects whether they transfer ownership of their return values. const return types indicate no transfer, and non-const return types indicate some transfer.

While this is useful, it’s by no means completely clear. For example, what if a function returns a non-const GList*? Is ownership of the list elements transferred, or just the list? What if the function’s author forgot to make the return type const, and actually there’s no transfer? This is where documentation comments are useful.

There’s a convention in GNOME documentation comments to specify the function which should be used to free a returned value. If such a function is mentioned, ownership of the returned memory is transferred. If no function is mentioned, ownership probably isn’t transferred, but it’s hard to be sure. That’s why it’s good to always be explicit when writing documentation comments.

/**
 * my_object_get_some_string:
 * @self: a #MyObject
 *
 * Gets the value of the #MyObject:some-string property.
 *
 * Return value: (transfer none): some string, owned by the
 * object and must not be freed
 */
const gchar *my_object_get_some_string (MyObject *self) {
	return self->priv->some_string;
}

/**
 * my_object_build_result:
 * @self: a #MyObject
 *
 * Builds a result string which probably represents something
 * meaningful.
 *
 * Return value: (transfer full): a newly allocated result
 * string; free with g_free()
 */
gchar *my_object_build_result (MyObject *self) {
	return g_strdup_printf ("%s %s",
	                        self->priv->some_string,
	                        self->priv->some_other_string);
}

/**
 * my_object_dup_controller:
 * @self: a #MyObject
 *
 * Gets the value of the #MyObject:controller property,
 * incrementing the controller's reference count.
 *
 * Return value: (transfer full): the object's
 * controller; unref with g_object_unref()
 */
MyController *my_object_dup_controller (MyObject *self) {
	return g_object_ref (self->priv->controller);
}

When GObject introspection was introduced, these kinds of documentation comments were formalised as introspection annotations: (transfer full), (transfer container) or (transfer none), as documented on wiki.gnome.org. These allow the runtimes of language bindings to manage memory correctly. Since a C programmer is essentially doing the same job as a language runtime when writing C code, the information provided by transfer annotations is sufficient for perfect memory management. Unfortunately, not all parameters and return types of all functions have these annotations added. The examples above do, however.

Finally, a few libraries use a function naming convention. Functions named *_get_* do not transfer ownership, whereas functions named *_dup_* (for ‘duplicate’) do transfer ownership. This can be seen in the examples above, or with json_node_get_array() vs. json_node_dup_array(). Be aware, though, that only a few libraries use this convention. Other libraries use *_get_* for both functions which do and do not transfer ownership of their results. Other code, such as that generated by gdbus-codegen, uses a different and incompatible convention: *_get_* methods signify full transfer, and *_peek_* methods signify no transfer. For example, goa_object_get_manager() vs. goa_object_peek_manager(). For this reason, going by function naming conventions only works within libraries, not between different libraries.

Memory management of parameters is analogous to return values: look at whether the parameter is const and whether there are any introspection annotations for it.

In summary, here are a set of guidelines one can follow to determine whether ownership of a return value is transferred, and hence whether the caller needs to free it:

  1. If the type has an introspection (transfer) annotation, look at that.
  2. Otherwise, if the type is const, there is no transfer.
  3. Otherwise, if the function documentation explicitly specifies the return value must be freed, there is full or container transfer.
  4. Otherwise, if the function is named *_dup_*, there is full or container transfer.
  5. Otherwise, if the function is named *_peek_*, there is no transfer.
  6. Otherwise, you need to look at the function’s code to determine whether it intends ownership to be transferred. Then file a bug against the documentation for that function, and ask for an introspection annotation to be added.

Some common pitfalls:

  • If you’re using an explicit typecast (e.g. casting a (const gchar*) return value to (gchar*)), it’s likely something’s wrong.
  • Generally, return values which are not transferred (such as (const gchar*)) are freed when the owning object is destroyed — so if you need such a value to persist, you must copy it or increase its reference count. You then have ownership of the copy or new reference, and are responsible for freeing it (not the original).

How can one check for incorrect memory handling? Use Valgrind. It will detect leaks and double-frees, and is simple to use:

valgrind --tool=memcheck --leak-check=full my-program-name

Or, if running your program from the source directory, use the following to avoid running leak checking on the libtool helper scripts:

libtool --mode=execute valgrind --tool=memcheck --leak-check=full ./my-program-name

Valgrind lists each memory problem it detects, along with a short backtrace (if you’ve compiled your program with debug symbols), allowing the cause of the memory error to be pinpointed and fixed!

Character encoding and locales

Recently, I've been looking into how character encoding and locales work on Linux, and I thought it might be worthwhile to write down my findings; partly so that I can look them up again later, and partly so that people can correct all the things I've got wrong.

To begin with, let's define some terminology:

  • Character set: a set of symbols which can be used together. This defines the symbols and their semantics, but not how they're encoded in memory. For example: Unicode. (Update: As noted in the comments, the character set doesn't define the appearance of symbols; this is left up to the fonts.)
  • Character encoding: a mapping from a character set to an representation of the characters in memory. For example: UTF-8 is one encoding of the Unicode character set.
  • Nul byte: a single byte which has a value of zero. Typically represented as the C escape sequence ‘\0’.
  • NULL character: the Unicode NULL character (U+0000) in the relevant encoding. In UTF-8, this is just a single nul byte. In UTF-16, however, it's a sequence of two nul bytes.

Now, the problem: if I'm writing a (command line) C program, how do strings get from the command line to the program, and how do strings get from the program to the terminal? More concretely, what actually happens with argv[] and printf()?

Let's consider the input direction first. When the main() function of a C program is called, it's passed an array of pointers to char arrays, i.e. strings. These strings can be arbitrary byte sequences (for example, file names), but are generally intended/assumed to be encoded in the user's environment's character encoding. This is set using the LC_ALL, LC_CTYPE or LANG environment variables. These variables specify the user's locale which (among other things) specifies the character encoding they use.

So the program receives as input a series of strings which are in an arbitrary encoding. This means that all programs have to be able to handle all possible character encodings, right? Wrong. A standard solution to this already exists in the form of libiconv. iconv() will convert between any two character encodings known to the system, so we can use it to convert from the user's environment encoding to, for example, UTF-8. How do we find out the user's environment encoding without parsing environment variables ourselves? We use setlocale() and nl_langinfo().

setlocale() parses the LC_ALL, LC_CTYPE and LANG environment variables (in that order of precedence) to determine the user's locale, and hence their character encoding. It then stores this locale, which will affect the behaviour of various C runtime functions. For example, it will change the formatting of numbers outputted by printf() to use the locale's decimal separator. Just calling setlocale() doesn't have any effect on character encodings, though. It won't, for example, cause printf() to magically convert strings to the user's environment encoding. More on this later.

nl_langinfo() is one function affected by setlocale(). When called with the CODESET type, it will return a string identifying the character encoding set in the user's environment. This can then be passed to iconv_open(), and we can use iconv() to convert strings from argv[] to our internal character encoding (which will typically be UTF-8).

At this point, it's worth noting that most people don't need to care about any of this. If using a library such as GLib – and more specifically, using its GOption command line parsing functionality – all this character encoding conversion is done automatically, and the strings it returns to you are guaranteed to be UTF-8 unless otherwise specified.

So we now have our input converted to UTF-8, our program can go ahead and do whatever processing it likes on it, safe in the knowledge that the character encoding is well defined and, for example, there aren't any unexpected embedded nul bytes in the strings. (This could happen if, for example, the user's environment character encoding was UTF-16; although this is really unlikely and might not even be possible on Linux — but that's a musing for another blog post).

Having processed the input and produced some output (which we'll assume is in UTF-8, for simplicity), many programs would just printf() the output and be done with it. printf() knows about character encodings, right? Wrong. printf() outputs exactly the bytes which are passed to its format parameter (ignoring all the fancy conversion specifier expansion), so this will only work if the program's internal character encoding is equal to the user's environment character encoding, for the characters being outputted. In many cases, the output of programs is just ASCII, so programs get away with just using printf() because most character encodings are supersets of ASCII. In general, however, more work is required to do things properly.

We need to convert from UTF-8 to the user's environment encoding so that what appears in their terminal is correct. We could just use iconv() again, but that would be boring. Instead, we should be able to use gettext(). This means we get translation support as well, which is always good.

gettext() takes in a msgid string and returns a translated version in the user's locale, if possible. Since these translations are done using message catalogues which may be in a completely different character encoding to the user's environment or the program's internal character encoding (UTF-8), gettext() helpfully converts from the message catalogue encoding to the user's environment encoding (the one returned by nl_langinfo(), discussed above). Great!

But what if no translation exists for a given string? gettext() returns the msgid string, unmodified and unconverted. This means that translatable string literals in our program need to magically be written in the user's environment encoding…and we're back to where we were before we introduced gettext(). Bother.

I see three solutions to this:

  • The gettext() solution: declare that all msgid strings should be in US-ASCII, and thus not use any Unicode characters. This works, provided we make the (reasonable) assumption that the user's environment encoding is a superset of ASCII. This requires that if a program wants to use Unicode characters in its translatable strings, it has to provide an en-US message catalogue to translate the American English msgid strings to American English (with Unicode). Not ideal.
  • The gettext()++ solution: declare that all msgid strings should be in UTF-8, and assume that anybody who's running without message catalogues is using UTF-8 as their environment encoding (this is a big assumption). Also not ideal, but a lot less work.
  • The iconv() solution: instruct gettext() to not return any strings in the user's environment encoding, but to return them all in UTF-8 instead (using bind_textdomain_codeset()), and use UTF-8 for the msgid strings. The program can then pass these translated (and untranslated) strings through iconv() as it did with the input, converting from UTF-8 to the user's environment encoding. More effort, but this should work properly.

An additional complication is that of combining translatable printf() format strings with UTF-8 string output from the program. Since printf() isn't encoding-aware, this requires that both the format string and the parameters are in the same encoding (or we get into a horrible mess with output strings which have substrings encoded in different ways). In this case, since our program output is in UTF-8, we definitely want to go with option 3 from above, and have gettext() return all translated messages in UTF-8. This also means we get to use UTF-8 in msgid strings. Unfortunately, it means that we now can't use printf() directly, and instead have to sprintf() to a string, use iconv() to convert that string from UTF-8 to the user's environment encoding, and then printf() it. Whew.

Here's a diagram which hopefully makes some of the journey clearer (click for a bigger version):

Diagram of the processing of strings from input to output in a C program.

So what does this mean for you? As noted above, in most cases it will mean nothing. Libraries such as GLib should take care of all of this for you, and the world will be a lovely place with ponies (U+1F3A0) and cats (U+1F431) everywhere. Still, I wanted to get this clear in my head, and hopefully it's useful to people who can't make use of libraries like GLib (for whatever reason).

Exploring exactly what GLib does is a matter for another time. Similarly, exploring how Windows does things is also best left to a later post (hint: Windows does things completely differently to Linux and other Unices, and I'm not sure it's for the better).

Reference count debugging with systemtap

I got some really helpful comments on yesterday's post about reference count debugging with gdb which enabled me to get systemtap working.

Getting systemtap working (on Fedora 13)

Install the systemtap-* and kernel-devel packages as per the instructions on the systemtap wiki. Note that the kernel packages need to be for the same version as the kernel you're currently running. I got caught out by this since I hadn't rebooted since I last downloaded an updated kernel package. You then need to add yourself to the stapdev and stapusr groups. Run the command stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}' to test whether everything's installed and working properly. systemtap might ask you to run a make command at this point, which you need to do.

Writing systemtap probes

The probe I'm using to sort out referencing issues is the following, based off the examples Alex Larsson gave when static probes were initially added to GLib and GObject. I've saved it as refs.stp:

global alive
global my_object = "FooObject"

probe gobject.object_new {
	if (type == my_object)
		alive++
}

probe gobject.object_ref {
	if (type == my_object) {
		printf ("%s %p ref (%u)\n", type, object, refcount)
		print_ubacktrace_brief ()
		printf ("\n")
	}
}

probe gobject.object_unref {
	if (type == my_object) {
		printf ("%s %p unref (%u)\n", type, object, old_refcount)
		print_ubacktrace_brief ()
		printf ("\n")

		if (old_refcount == 1)
			alive--
	}
}

probe end {
	printf ("Alive objects: \n")
	if (alive > 0)
		printf ("%d\t%s\n", alive, my_object)
}

This counts how many instances of the FooObject class are created (using a probe on g_object_new()) and destroyed (probing on g_object_unref() and decrementing the alive counter when the last reference is dropped). References and dereferences are also counted, with a short backtrace being outputted for each, which is the key thing I was looking for when debugging reference counting problems.

Using the probes

I was debugging problems in Empathy, so I had to use the following command:

stap refs.stp \
-d ${libdir}/libfolks.so \
-d ${libdir}/libfolks-telepathy.so \
-d ${libdir}/libglib-2.0.so \
-d ${libdir}/libgobject-2.0.so \
-d ${libdir}/libgee.so \
-d ${libdir}/libgtk-x11-2.0.so \
-d ${bindir}/empathy \
-c "${bindir}/empathy"

Each -d option tells systemtap to load unwind data from the given library or executable, which is the key thing I was missing yesterday; these options are necessary for the backtraces to be useful, since systemtap stops unwinding a backtrace at the first frame it can't map to a symbol name. Note that it's necessary to explicitly tell systemtap to load data from the empathy executable, even though it then runs Empathy to trace it.

This gives output like the following when tracing the EmpathyMainWindow object:

EmpathyIndividualStore 0x09c05a10 ref (2)
g_object_ref+0x138
g_value_object_collect_value+0xe0
g_value_set_instance+0x190
.L1016+0x1e0
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

EmpathyIndividualStore 0x09c05a10 unref (2)
g_object_unref+0x13f
g_value_object_free_value+0x2a
g_value_unset+0x6d
.L1041+0x100
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

The only thing I need to do now is to figure out how to script systemtap so that it indents each backtrace nicely according to the reference count of the object.

Reference count debugging with gdb

As I was hacking today, I ran into some hard-to-debug reference counting problems with one of my classes. The normal smattering of printf()s didn't help, and neither did this newfangled systemtap, which was a bit disappointing.

It worked, in that my probes were correctly run and correctly highlighted each reference/dereference of the class I was interested in, but printing a backtrace only extended to the g_object_ref()/g_object_unref() call, and no further. I'm guessing this was a problem with the location of the debug symbols for my code (since it was in a development prefix, whereas systemtap was not), but it might be that systemtap hasn't quite finished userspace stuff yet. That's what I read, at least.

In the end, I ended up using conditional breakpoints in gdb. This was a lot slower than systemtap, but it worked. It's the sort of thing I would've killed to know a few years (or even a few months) ago, so hopefully it's useful for someone (even if it's not the most elegant solution out there).

set pagination off
set $foo=0
break main
run

break g_object_ref
condition 2 _object==$foo
commands
	silent
	bt 8
	cont
	end

break g_object_unref
condition 3 _object==$foo
commands
	silent
	bt 8
	cont
	end

break my_object_init
commands
	silent
	set $foo=my_object
	cont
	end
enable once 4
cont

The breakpoint in main() is to stop gdb discarding our breakpoints out of hand because the relevant libraries haven't been loaded yet. $foo contains the address of the first instance of MyObject in the program; if you need to trace the n+1th instance, use ignore 4 n to only fire the my_object_init breakpoint on the n+1th MyObject instantiation.

This can be extended to track (a fixed number of) multiple instances of the object, by using several $fooi variables and gdb's if statements to set them as appropriate. This is left as an exercise to the reader!

I welcome the inevitable feedback and criticism of this approach. It's hacky, ugly and slower than systemtap, but at least it works.

Reviewing and applying a patch

I've been fortunate enough to have been reviewing a lot of patches recently. Fortunate, because it means other people are contributing to my library. However, few of these contributions are without their problems; as with all contributions, each patch generally goes through two or three iterations before I think it's near enough to being ready that it's easier for me to apply the patch than it is to comment on it and request an updated version.

Generally, when patches get to this stage, it's just the really small, nitpicky things which are still wrong. Rogue whitespace, quirky indentation, typos in documentation…these are all really minor things, but they take time to check through and correct. One of the latest libgdata patches was, I think, a bit of a rushed job; and so there were more of these niggly problems than usual. Instead of going through and fixing all the problems and never giving detailed feedback about them to the patch contributor – as is usually the way, since the changes are so trivial, albeit cumulatively not inconsiderate – I decided to make a screencast of the process I go through when reviewing a patch.

Reviewing and applying a patch

It's about half an hour long, unscripted and unedited, so there are a couple of mistakes and omissions in there (for example, I later noticed I'd forgotten to mention the lack of input validation on the new function). However, I think it's quite comprehensive. It's aimed at those who are getting used to the open-source patch submission and development process, though those more talented than me welcome to watch it and lambast me for not using Emacs. Hopefully it's useful to someone, anyway. It's licenced under cc-by-sa 2.0.

Update on JS string concatenation

Somebody over at Slashdot has just pointed out a method of easily concatenating numbers. Although this is helpful, it can't address the underlying problem, which is that JavaScript shouldn't be using the + operator for two things, and it still doesn't address my main problem of JS concatenating instead of adding two numeric strings together. 🙁

JavaScript string concatenation

I've been coding JavaScript quite a lot for ABXY recently, and one thing which has got me really annoyed (apart from JavaScript's odd and overly-flexible "OOP" architecture) is its + operator.

In theory, the + operator is brilliant. Instead of being boring, and only doing one thing, it can be used for both numerical addition, and string concatenation. That's fine, you might say, but tell me; have you ever considered what happens when you want to add together two numerical strings, or concatenate two numbers?

The answer is that in the first instance, JavaScript will concatenate the strings, and in the second, it will add the two numbers together. That's perfect, according to the design, but it's not helpful. :( Most input and other variables in JavaScript are strings, so every time you want to add two numerical strings together, you have to cast them as numbers, using the parseInt or the parseFloat function, which is easy to forget at best, and downright inefficient at worst.

This is awkward, and unlike the rest of JavaScript; in other situations, JavaScript will happily cast between types to find two which match the operation you're trying to perform. Why is it different here? It's different because JavaScript doesn't know whether you're wanting to concatenate, or to add, and this is the underlying problem. JavaScript should follow PHP's example, and have separate addition and concatenation operators (+ and ., respectively).