Tag Archives: Valgrind

GNOME Software performance in GNOME 40

tl;dr: Use callgrind to profile CPU-heavy workloads. In some cases, moving heap allocations to the stack helps a lot. GNOME Software startup time has decreased from 25 seconds to 12 seconds (-52%) over the GNOME 40 cycle.

To wrap up the sporadic blog series on the progress made with GNOME Software for GNOME 40, I’d like to look at some further startup time profiling which has happened this cycle.

This profiling has focused on libxmlb, which gnome-software uses extensively to query the appstream data which provides all the information about apps which it shows in the UI. The basic idea behind libxmlb is that it pre-compiles a ‘silo’ of information about an XML file, which is cached until the XML file next changes. The silo format encodes the tree structure of the XML, deduplicating strings and allowing fast traversal without string comparisons or parsing. It is memory mappable, so can be loaded quickly and shared (read-only) between multiple processes. It allows XPath queries to be run against the XML tree, and returns the results.

gnome-software executes a lot of XML queries on startup, as it loads all the information needed to show many apps to the user. It may be possible to eliminate some of these queries – and some earlier work did reduce the number by binding query parameters at runtime to pre-prepared queries – but it seems unlikely that we’ll be able to significantly reduce their number further, so better speed them up instead.

Profiling work which happens on a CPU

The work done in executing an XPath query in libxmlb is largely on the CPU — there isn’t much I/O to do as the compiled XML file is only around 7MB in size (see ~/.cache/gnome-software/appstream), so this time the most appropriate tool to profile it is callgrind. I ruled out using callgrind previously for profiling the startup time of gnome-software because it produces too much data, risks hiding the bigger picture of which parts of application startup were taking the most time, and doesn’t show time spent on I/O. However, when looking at a specific part of startup (XML queries) which are largely CPU-bound, callgrind is ideal.

valgrind --tool=callgrind --collect-systime=msec --trace-children=no gnome-software

It takes about 10 minutes for gnome-software to start up and finish loading the main window when running under callgrind, but eventually it’s shown, the process can be interrupted, and the callgrind log loaded in kcachegrind:

Here I’ve selected the main() function and the callee map for it, which shows a 2D map of all the functions called beneath main(), with the area of each function proportional to the cumulative time spent in that function.

The big yellow boxes are all memset(), which is being called on heap-allocated memory to set it to zero before use. That’s a low hanging fruit to optimise.

In particular, it turns out that the XbStack and XbOperands which libxmlb creates for evaluating each XPath query were being allocated on the heap. With a few changes, they can be allocated on the stack instead, and don’t need to be zero-filled when set up, which saves a lot of time — stack allocation is a simple increment of the stack pointer, whereas heap allocation can involve page mapping, locking, and updates to various metadata structures.

The changes are here, and should benefit every user of libxmlb without further action needed on their part. With those changes in place, the callgrind callee map is a lot less dominated by one function:

There’s still plenty left to go at, though. Contributions are welcome, and we can help you through the process if you’re new to it.

What’s this mean for gnome-software in GNOME 40?

Overall, after all the performance work in the GNOME 40 cycle, startup time has decreased from 25 seconds to 12 seconds (-52%) when starting for the first time since the silo changed. This is the situation in which gnome-software normally starts, as it sits as a background process after that, and the silo is likely to change every day or two.

There are plans to stop gnome-software running as a background process, but we are not there yet. It needs to start up in 1–2 seconds for that to give a good user experience, so there’s a bit more optimisation to do yet!

Aside from performance work, there’s a number of other improvements to gnome-software in GNOME 40, including a new icon, some improvements to parts of the interface, and a lot of bug fixes. But perhaps they should be explored in a separate blog post.

Many thanks to my fellow gnome-software developers – Milan, Phaedrus and Richard – for their efforts this cycle, and my employer the Endless OS Foundation for prioritising working on this.

Automatically Valgrinding code with AX_VALGRIND_CHECK

tl;dr: For `make check` with Valgrind support, add AX_VALGRIND_CHECK to configure.ac and @VALGRIND_CHECK_RULES@ to tests/Makefile.am. Use memcheck, helgrind, drd and sgcheck.

Once you’ve written a unit test suite with near-100% coverage, you can validate software with a lot more confidence than before, since any effort you put into validation is working across the whole code base. ((Normal caveats about the combinatorial complexity of dynamic testing apply.)) It’s quite common to check for memory leaks with Valgrind’s memcheck tool — but Valgrind has several additional tools which are useful in production: helgrind, drd and sgcheck. ((Although currently the threading tools may not work with GLib threads, due to its use of futexes rather than pthreads.)) All of these can be run over unit tests.

How can you run a test suite under these tools? The normal method is some kind of hard-to-remember rune involving libtool, and either something about TESTS_ENVIRONMENT or some kind of LOG_COMPILER (but why would I want to compile my test logs?). For that reason, I’ve written an autoconf macro which abstracts it all a bit: AX_VALGRIND_CHECK. It adds a check-valgrind target to your makefile which handles running the `make check` tests under all four Valgrind tools of interest. Various ancillary variables (such as VALGRIND_SUPPRESSIONS_FILES or VALGRIND_FLAGS) allow the Valgrind options to be changed. `make check-valgrind` will return an error exit code if any problems are found, with the idea that the target can be used for automated testing and continuous integration.

To use it, either add a dependency on autoconf-archive to your project, or copy the ax_valgrind_check.m4 macro in-tree (and add it to EXTRA_DIST), then follow the instructions at the top of the file for adding it to configure.ac and tests/Makefile.am.

With this macro, I’ve tried to bring together existing makefile snippets which exist in various projects to produce one macro which can be reused everywhere. Of course, I’ve probably missed something – some feature or specific use case which someone has – so if anyone has any suggestions for improvement, please let me know or submit a patch to the autoconf-archive! For example, at the moment the macro requires automake’s parallel test harness, and does not support older versions of automake.

Thanks to my employer, Collabora, for enabling me to work on (hopefully useful) stuff like this!

(const gchar*) vs. (gchar*) and other memory management stories

This article now been upstreamed to the GNOME Programming Guidelines, where future updates will be made to it.

Memory management in C is often seen as a minefield, and while it’s not as simple as in more modern languages, if a few conventions are followed it doesn’t have to be hard at all. Here are some pointers about heap memory management in C using GObject and the usual GNOME types and coding conventions, including GObject introspection. A basic knowledge of C is assumed, but no more. Stack memory allocation and other obscure patterns aren’t covered, since they’re much less commonly used in a typical GObject application.

The most important rule when thinking about memory management is: which code has ownership of this memory?

Ownership can be defined as the responsibility to free or deallocate a piece of memory. If code which does own some memory doesn’t deallocate that memory, that’s a memory leak. If code which doesn’t own some memory deallocates it (in addition to the owner doing so), that’s a double-free. Both are bad.

For example, if a function returns a newly allocated string and doesn’t retain a pointer to it, ownership of the string is transferred to the caller. Note that when thinking about ownership transfer, malloc()/free() and reference counting are treated the same: in the former case, a newly allocated piece of heap memory is being transferred; in the latter, a newly incremented reference.

gchar *transfer_full_function (void) {
	return g_strdup ("Newly allocated string.");
}

/* Ownership is transferred here. */
gchar *my_string = transfer_full_function ();

Conversely, if the function does retain a pointer (e.g. inside an object), ownership is not transferred to the caller.

const gchar *transfer_none_function (void) {
	return "Static string.";
}

const gchar *transfer_none_method (MyObject *self) {
	gchar *new_string = g_strdup ("Newly allocated string.");
	/* Ownership is retained by the object here. */
	self->priv->saved_string = new_string;
	return new_string;
}

/* In both of these cases, ownership is not transferred. */
const gchar *my_string1 = transfer_none_function ();
const gchar *my_string2 = transfer_none_method (some_object);

In all of these examples, you may notice the return type of the functions reflects whether they transfer ownership of their return values. const return types indicate no transfer, and non-const return types indicate some transfer.

While this is useful, it’s by no means completely clear. For example, what if a function returns a non-const GList*? Is ownership of the list elements transferred, or just the list? What if the function’s author forgot to make the return type const, and actually there’s no transfer? This is where documentation comments are useful.

There’s a convention in GNOME documentation comments to specify the function which should be used to free a returned value. If such a function is mentioned, ownership of the returned memory is transferred. If no function is mentioned, ownership probably isn’t transferred, but it’s hard to be sure. That’s why it’s good to always be explicit when writing documentation comments.

/**
 * my_object_get_some_string:
 * @self: a #MyObject
 *
 * Gets the value of the #MyObject:some-string property.
 *
 * Return value: (transfer none): some string, owned by the
 * object and must not be freed
 */
const gchar *my_object_get_some_string (MyObject *self) {
	return self->priv->some_string;
}

/**
 * my_object_build_result:
 * @self: a #MyObject
 *
 * Builds a result string which probably represents something
 * meaningful.
 *
 * Return value: (transfer full): a newly allocated result
 * string; free with g_free()
 */
gchar *my_object_build_result (MyObject *self) {
	return g_strdup_printf ("%s %s",
	                        self->priv->some_string,
	                        self->priv->some_other_string);
}

/**
 * my_object_dup_controller:
 * @self: a #MyObject
 *
 * Gets the value of the #MyObject:controller property,
 * incrementing the controller's reference count.
 *
 * Return value: (transfer full): the object's
 * controller; unref with g_object_unref()
 */
MyController *my_object_dup_controller (MyObject *self) {
	return g_object_ref (self->priv->controller);
}

When GObject introspection was introduced, these kinds of documentation comments were formalised as introspection annotations: (transfer full), (transfer container) or (transfer none), as documented on wiki.gnome.org. These allow the runtimes of language bindings to manage memory correctly. Since a C programmer is essentially doing the same job as a language runtime when writing C code, the information provided by transfer annotations is sufficient for perfect memory management. Unfortunately, not all parameters and return types of all functions have these annotations added. The examples above do, however.

Finally, a few libraries use a function naming convention. Functions named *_get_* do not transfer ownership, whereas functions named *_dup_* (for ‘duplicate’) do transfer ownership. This can be seen in the examples above, or with json_node_get_array() vs. json_node_dup_array(). Be aware, though, that only a few libraries use this convention. Other libraries use *_get_* for both functions which do and do not transfer ownership of their results. Other code, such as that generated by gdbus-codegen, uses a different and incompatible convention: *_get_* methods signify full transfer, and *_peek_* methods signify no transfer. For example, goa_object_get_manager() vs. goa_object_peek_manager(). For this reason, going by function naming conventions only works within libraries, not between different libraries.

Memory management of parameters is analogous to return values: look at whether the parameter is const and whether there are any introspection annotations for it.

In summary, here are a set of guidelines one can follow to determine whether ownership of a return value is transferred, and hence whether the caller needs to free it:

  1. If the type has an introspection (transfer) annotation, look at that.
  2. Otherwise, if the type is const, there is no transfer.
  3. Otherwise, if the function documentation explicitly specifies the return value must be freed, there is full or container transfer.
  4. Otherwise, if the function is named *_dup_*, there is full or container transfer.
  5. Otherwise, if the function is named *_peek_*, there is no transfer.
  6. Otherwise, you need to look at the function’s code to determine whether it intends ownership to be transferred. Then file a bug against the documentation for that function, and ask for an introspection annotation to be added.

Some common pitfalls:

  • If you’re using an explicit typecast (e.g. casting a (const gchar*) return value to (gchar*)), it’s likely something’s wrong.
  • Generally, return values which are not transferred (such as (const gchar*)) are freed when the owning object is destroyed — so if you need such a value to persist, you must copy it or increase its reference count. You then have ownership of the copy or new reference, and are responsible for freeing it (not the original).

How can one check for incorrect memory handling? Use Valgrind. It will detect leaks and double-frees, and is simple to use:

valgrind --tool=memcheck --leak-check=full my-program-name

Or, if running your program from the source directory, use the following to avoid running leak checking on the libtool helper scripts:

libtool --mode=execute valgrind --tool=memcheck --leak-check=full ./my-program-name

Valgrind lists each memory problem it detects, along with a short backtrace (if you’ve compiled your program with debug symbols), allowing the cause of the memory error to be pinpointed and fixed!