This article has been tweaked and upstreamed to developer.gnome.org. The original is kept below, but future updates will be made there. If you find a problem, please file a bug.
Continuing in this fledgling series of examining GLib’s GMainContext, this post looks at ensuring that functions are called in the right main context when programming with multiple threads.
tl;dr: Use g_main_context_invoke_full() or GTask. See the end of the post for some guidelines about multi-threaded programming using GLib and main contexts.
To begin with, what is ‘the right context’? Taking a multi-threaded GLib program, let’s assume that each thread has a single GMainContext running in a main loop — this is the thread default main context.((Why use main contexts? A main context effectively provides a work or message queue for a thread — something which the thread can periodically check to determine if there is work pending from another thread. It’s not possible to pre-empt a thread’s execution without using hideous POSIX signalling). I’m ignoring the case of non-default contexts, but their use is similar.)) So ‘the right context’ is the one in the thread you want a function to execute in. For example, if I’m doing a long and CPU-intensive computation I will want to schedule this in a background thread so that it doesn’t block UI updates from the main thread. The results from this computation, however, might need to be displayed in the UI, so some UI update function has to be called in the main thread once the computation’s complete. Furthermore, if I can limit a function to being executed in a single thread, it becomes easy to eliminate the need for locking a lot of the data it accesses((Assuming that other threads are implemented similarly and hence most data is accessed by a single thread, with threads communicating by message passing, allowing each thread to update its data at its leisure.)), which makes multi-threaded programming a whole lot simpler.
For some functions, I might not care which context they’re executed in, perhaps because they’re asynchronous and hence do not block the context. However, it still pays to be explicit about which context is used, since those functions may emit signals or invoke callbacks, and for reasons of thread safety it’s necessary to know which threads those signal handlers or callbacks are going to be invoked in. For example, the progress callback in g_file_copy_async() is documented as being called in the thread default main context at the time of the initial call.
The core principle of invoking a function in a specific context is simple, and I’ll walk through it as an example before demonstrating the convenience methods which should actually be used in practice. A GSource has to be added to the specified GMainContext, which will invoke the function when it’s dispatched. This GSource should almost always be an idle source created with g_idle_source_new(), but this doesn’t have to be the case. It could be a timeout source so that the function is executed after a delay, for example.
As described previously, this GSource will be added to the specified GMainContext and dispatched as soon as it’s ready((In the case of an idle source, this will be as soon as all sources at a higher priority have been dispatched — this can be tweaked using the idle source’s priority parameter with g_source_set_priority(). I’m assuming the specified GMainContext is being run in a GMainLoop all the time, which should be the case for the default context in a thread.)), calling the function on the thread’s stack. The source will typically then be destroyed so the function is only executed once (though again, this doesn’t have to be the case).
Data can be passed between threads in this manner in the form of the user_data passed to the GSource’s callback. This is set on the source using g_source_set_callback(), along with the callback function to invoke. Only a single pointer is provided, so if multiple bits of data need passing, they must be packaged up in a custom structure first.
Here’s an example. Note that this is to demonstrate the underlying principles, and there are convenience methods explained below which make this simpler.
/* Main function for the background thread, thread1. */ static gpointer thread1_main (gpointer user_data) { GMainContext *thread1_main_context = user_data; GMainLoop *main_loop; /* Set up the thread’s context and run it forever. */ g_main_context_push_thread_default (thread1_main_context); main_loop = g_main_loop_new (thread1_main_context, FALSE); g_main_loop_run (main_loop); g_main_loop_unref (main_loop); g_main_context_pop_thread_default (thread1_main_context); g_main_context_unref (thread1_main_context); return NULL; } /* A data closure structure to carry multiple variables between * threads. */ typedef struct { gchar *some_string; /* owned */ guint some_int; GObject *some_object; /* owned */ } MyFuncData; static void my_func_data_free (MyFuncData *data) { g_free (data->some_string); g_clear_object (&data->some_object); g_slice_free (MyFuncData, data); } static void my_func (const gchar *some_string, guint some_int, GObject *some_object) { /* Do something long and CPU intensive! */ } /* Convert an idle callback into a call to my_func(). */ static gboolean my_func_idle (gpointer user_data) { MyFuncData *data = user_data; my_func (data->some_string, data->some_int, data->some_object); return G_SOURCE_REMOVE; } /* Function to be called in the main thread to schedule a call to * my_func() in thread1, passing the given parameters along. */ static void invoke_my_func (GMainContext *thread1_main_context, const gchar *some_string, guint some_int, GObject *some_object) { GSource *idle_source; MyFuncData *data; /* Create a data closure to pass all the desired variables * between threads. */ data = g_slice_new0 (MyFuncData); data->some_string = g_strdup (some_string); data->some_int = some_int; data->some_object = g_object_ref (some_object); /* Create a new idle source, set my_func() as the callback with * some data to be passed between threads, bump up the priority * and schedule it by attaching it to thread1’s context. */ idle_source = g_idle_source_new (); g_source_set_callback (idle_source, my_func_idle, data, (GDestroyNotify) my_func_data_free); g_source_set_priority (idle_source, G_PRIORITY_DEFAULT); g_source_attach (idle_source, thread1_main_context); g_source_unref (idle_source); } /* Main function for the main thread. */ static void main (void) { GThread *thread1; GMainContext *thread1_main_context; /* Spawn a background thread and pass it a reference to its * GMainContext. Retain a reference for use in this thread * too. */ thread1_main_context = g_main_context_new (); g_thread_new ("thread1", thread1_main, g_main_context_ref (thread1_main_context)); /* Maybe set up your UI here, for example. */ /* Invoke my_func() in the other thread. */ invoke_my_func (thread1_main_context, "some data which needs passing between threads", 123456, some_object); /* Continue doing other work. */ }That’s a lot of code, and it doesn’t look fun. There are several points of note here:
- This invocation is uni-directional: it calls my_func() in thread1, but there’s no way to get a return value back to the main thread. To do that, the same principle needs to be used again, invoking a callback function in the main thread. It’s a straightforward extension which isn’t covered here.
- Thread safety: This is a vast topic, but the key principle is that data which is potentially accessed by multiple threads must have mutual exclusion enforced on those accesses using a mutex. What data is potentially accessed by multiple threads here? thread1_main_context, which is passed in the fork call to thread1_main; and some_object, a reference to which is passed in the data closure. Critically, GLib guarantees that GMainContext is thread safe, so sharing thread1_main_context between threads is fine. The other code in this example must ensure that some_object is thread safe too, but that’s a topic for another blog post. Note that some_string and some_int cannot be accessed from both threads, because copies of them are passed to thread1, rather than the originals. This is a standard technique for making cross-thread calls thread safe without requiring locking. It also avoids the problem of synchronising freeing some_string. Similarly, a reference to some_object is transferred to thread1, which works around the issue of synchronising destruction of the object.
- Specificity: g_idle_source_new() was used rather than the simpler g_idle_add() so that the GMainContext the GSource is attached to could be specified.
With those principles and mechanisms in mind, let’s take a look at a convenience method which makes this a whole lot easier: g_main_context_invoke_full().((Why not g_main_context_invoke()? It doesn’t allow a GDestroyNotify function for the user data to be specified, limiting its use in the common case of passing data between threads.)) As stated in its documentation, it invokes a callback so that the specified GMainContext is owned during the invocation. In almost all cases, the context being owned is equivalent to it being run, and hence the function must be being invoked in the thread for which the specified context is the thread default.
Modifying the earlier example, the invoke_my_func() function can be replaced by the following:
static void invoke_my_func (GMainContext *thread1_main_context, const gchar *some_string, guint some_int, GObject *some_object) { MyFuncData *data; /* Create a data closure to pass all the desired variables * between threads. */ data = g_slice_new0 (MyFuncData); data->some_string = g_strdup (some_string); data->some_int = some_int; data->some_object = g_object_ref (some_object); /* Invoke the function. */ g_main_context_invoke_full (thread1_main_context, G_PRIORITY_DEFAULT, my_func_idle, data, (GDestroyNotify) my_func_data_free); }That’s a bit simpler. Let’s consider what happens if invoke_my_func() were to be called from thread1, rather than from the main thread. With the original implementation, the idle source would be added to thread1’s context and dispatched on the context’s next iteration (assuming no pending dispatches with higher priorities). With the improved implementation, g_main_context_invoke_full() will notice that the specified context is already owned by the thread (or can be acquired by it), and will call my_func_idle() directly, rather than attaching a source to the context and delaying the invocation to the next context iteration. This subtle behaviour difference doesn’t matter in most cases, but is worth bearing in mind since it can affect blocking behaviour (i.e. invoke_my_func() would go from taking negligible time, to taking the same amount of time as my_func() before returning).
How can I be sure a function is always executed in the thread I expect? Since I’m now thinking about which thread each function could be called in, it would be useful to document this in the form of an assertion:
g_assert (g_main_context_is_owner (expected_main_context));If that’s put at the top of each function, any assertion failure will highlight a case where a function has been called directly from the wrong thread. This technique was invaluable to me recently when writing code which used upwards of four threads with function invocations between all of them. It’s a whole lot easier to put the assertions in when initially writing the code than it is to debug the race conditions which easily result from a function being called in the wrong thread.
This can also be applied to signal emissions and callbacks. As well as documenting which contexts a signal or callback will be emitted in, assertions can be added to ensure that this is always the case. For example, instead of using the following when emitting a signal:
guint param1; /* arbitrary example parameters */ gchar *param2; guint retval = 0; g_signal_emit_by_name (my_object, "some-signal", param1, param2, &retval);it would be better to use the following:
static guint emit_some_signal (GObject *my_object, guint param1, const gchar *param2) { guint retval = 0; g_assert (g_main_context_is_owner (expected_main_context)); g_signal_emit_by_name (my_object, "some-signal", param1, param2, &retval); return retval; }As well as asserting emission happens in the right context, this improves type safety. Bonus! Note that signal emission via g_signal_emit() is synchronous, and doesn’t involve a main context at all. As signals are a more advanced version of callbacks, this approach can be applied to those as well.
Before finishing, it’s worth mentioning GTask. This provides a slightly different approach to invoking functions in other threads, which is more suited to the case where you want your function to be executed in some background thread, but don’t care exactly which one. GTask will take a data closure, a function to execute, and provide ways to return the result from this function; and will then handle everything necessary to run that function in a thread belonging to some thread pool internal to GLib. Although, by combining g_main_context_invoke_full() and GTask, it should be possible to run a task in a specific context and effortlessly return its result to the current context:
/* This will be invoked in thread1. */ static gboolean my_func_idle (gpointer user_data) { GTask *task = G_TASK (user_data); MyFuncData *data; gboolean retval; /* Call my_func() and propagate its returned boolean to * the main thread. */ data = g_task_get_task_data (task); retval = my_func (data->some_string, data->some_int, data->some_object); g_task_return_boolean (task, retval); return G_SOURCE_REMOVE; } /* Whichever thread is invoked in, the @callback will be invoked in * once my_func() has finished and returned a result. */ static void invoke_my_func_with_result (GMainContext *thread1_main_context, const gchar *some_string, guint some_int, GObject *some_object, GAsyncReadyCallback callback, gpointer user_data) { MyFuncData *data; /* Create a data closure to pass all the desired variables * between threads. */ data = g_slice_new0 (MyFuncData); data->some_string = g_strdup (some_string); data->some_int = some_int; data->some_object = g_object_ref (some_object); /* Create a GTask to handle returning the result to the current * thread default main context. */ task = g_task_new (NULL, NULL, callback, user_data); g_task_set_task_data (task, data, (GDestroyNotify) my_func_data_free); /* Invoke the function. */ g_main_context_invoke_full (thread1_main_context, G_PRIORITY_DEFAULT, my_func_idle, task, (GDestroyNotify) g_object_unref); }So in summary:
- Use g_main_context_invoke_full() to invoke functions in other threads, under the assumption that every thread has a thread default main context which runs throughout the lifetime of that thread.
- Use GTask if you only want to run a function in the background and don’t care about the specifics of which thread is used.
- In any case, liberally use assertions to check which context is executing a function, and do this right from the start of a project.
- Explicitly document contexts a function is expected to be called in, a callback will be invoked in, or a signal will be emitted in.
- Beware of g_idle_add() and similar functions which use the global default main context.
Not sure if you've already got one planned, but an episode on GASyncQueue would be great. For when threading isn't totally in your control, the worker-thread + GASyncQueue model is great.