Startup time profiling of gnome-software

Following on from the heap profiling I did on gnome-software to try and speed it up for Endless, the next step was to try profiling the computation done when starting up gnome-software — which bits of code are taking time to run?

tl;dr: There is new tooling in sysprof and GLib from git which makes profiling the performance of high-level tasks simpler. Some fixes have landed in gnome-software as a result.

Approaches which don’t work

The two traditional tools for this – callgrind, and print statements – aren’t entirely suitable for gnome-software.

I tried running valgrind --tool=callgrind gnome-software, and then viewing the results in KCachegrind, but it slowed gnome-software down so much that it was unusable, and the test/retry cycle of building and testing changes would have been soul destroyingly slow.

callgrind works by simulating the CPU’s cache and looking at cache reads/writes/hits/misses, and then attributing costs for those back up the call graph. This makes it really good at looking at the cost of a certain function, or the total cost of all the calls to a utility function; but it’s not good at attributing the costs of higher-level dynamic tasks. gnome-software uses a lot of tasks like this (GsPluginJob), where the task to be executed is decided at runtime with some function arguments, rather than at compile time by the function name/call. For example “get all the software categories” or “look up and refine the details of these three GsApp instances”.

That said, it was possible to find and fix a couple of bits of low-hanging optimisation fruit using callgrind.

Print statements are the traditional approach to profiling higher-level dynamic tasks: print one line at the start of a high-level task with the task details and a timestamp, and print another line at the end with another timestamp. The problem comes from the fact that gnome-software runs so many high-level tasks (there are a lot of apps to query, categorise, and display, using tens of plugins) that reading the output is quite hard. And it’s even harder to compare the timings and output between two runs to see if a code change is effective.

Enter sysprof

Having looked at sysprof briefly for the heap profiling work, and discounted it, I thought it might make sense to come back to it for this speed profiling work. Christian had mentioned at GUADEC in Thessaloniki that the design of sysprof means apps and libraries can send their own profiling events down a socket, and those events will end up in the sysprof capture.

It turns out that’s remarkably easy: link against libsysprof-capture-4.a and call sysprof_capture_writer_add_mark() every time a high-level task ends, passing the task duration and details to it. There’s even an example app in the sysprof repository.

So I played around with this newly-instrumented version of gnome-software for a bit, but found that there were still empty regions in the profiling trace, where time passed and computation was happening, but nothing useful was logged in the sysprof capture. More instrumentation was needed.

sysprof + GLib

gnome-software does a lot of its computation in threads, bringing the results back into the main thread to be rendered in the UI using idle callbacks.

For example, the task to list the apps in a particular category in gnome-software will run in a thread, and then schedule an idle callback in the main thread with the list of apps. The idle callback will then iterate over those apps and add them to (for example) a GtkFlowBox to be displayed.

Adding items to a GtkFlowBox takes some time, and if there are a couple of hundred of apps to be added in a single idle callback, that can take several hundred milliseconds — a long enough time to block the main UI from being redrawn that the user will notice.

How do you find out which idle callback is taking too long? sysprof again! I added sysprof support to GLib so that GSource.dispatch events are logged (along with a few others), and now the long-running idle callbacks are displayed in the sysprof graphs. Thanks to Christian and Richard for their reviews and contributions to those changes.

This capture file was generated using sysprof-cli --gtk --use-trace-fd -- gnome-software, and the ‘gnome-software’ and ‘GLib’ lines in the ‘Timings’ row need to be made visible using the drop-down menu in the ‘Timings’ row.

It’s important to call g_task_set_source_tag() or g_task_set_name() on all the GTasks in your code, and to call g_source_set_name() on the GSources (like this), so that the marks in the capture file have helpful names.

In it, you can see the ‘get-updates’ plugin job on gnome-software’s flatpak plugin is taking 1.5 seconds (in a thread), and then 175ms to process the results in the main thread.

The selected row above that is showing it’s taking 110ms to process the results from a call to gs_plugin_loader_job_get_categories_async() in the main thread.

What’s next?

With the right tooling in place, it should be easier for me and others to find and fix performance issues like these, in gnome-software and in other projects.

I’ve submitted a few fixes, but there are more to do, and I need to shift my focus onto other things for now.

Please try out the new sysprof features, and add libsysprof-capture-4.a support to your project (if it would help you debug high-level performance problems). Ask questions on Discourse (and @ me).

To try out the new features, you’ll need the latest versions of sysprof and GLib from git.

Easily speed up CI by reducing download size

Every time a CI pipeline runs on GitLab, it downloads the git repository for your project. Often, pipeline jobs are set up to make further downloads (of dependencies or subprojects), which are also run on each job.

Assuming that you’ve built a Docker image containing all your dependencies, to minimise how often they’re re-downloaded (you really should do this, it speeds up CI a lot), you can make further improvements by:

  1. Limiting the clone depth of your repository in the GitLab settings: Settings ? CI/CD, and change it to use a ‘git shallow clone’ of depth 1.
  2. Adding --branch, --no-tags and --depth 1 arguments to every git clone call you make during a CI job. Here’s an example for GLib.
  3. Adding depth=1 to your Meson .wrap files to achieve the same thing when (for example) meson subprojects download is called. See the same example merge request.

For GLib, the difference between git clone https://gitlab.gnome.org/GNOME/glib.git and git clone --depth 1 https://gitlab.gnome.org/GNOME/glib.git is 66MB (reducing from 74MB to 8MB), or a factor of 10. It won’t be as much for younger or smaller projects, but still worthwhile.

URI parsing and building in GLib

Marc-André Lureau has landed GUri support in GLib, and it’ll be available in GLib 2.65.1 (due out in the next few days).

GUri is a new API for parsing and building URIs, roughly equivalent to SoupURI already provided by libsoup — but since URIs are so pervasive, and used even if you’re not actually doing HTTP conversations, it makes sense to have a structured representation for them in GLib.

To parse a URI, use g_uri_parse() or g_uri_split():

g_autoptr(GError) local_error = NULL;
const gchar *uri_str;
g_autoptr(GUri) uri = NULL;
g_autoptr(GHashTable) query_params = NULL;

uri_str = "https://discourse.gnome.org/search?q=search%20terms#ember372";
uri = g_uri_parse (uri_str,
                   G_URI_FLAGS_PARSE_STRICT |
                   G_URI_FLAGS_ENCODED_QUERY,
                   &local_error);
if (uri == NULL)
  {
    /* Handle the error */
    g_error ("Invalid URI: %s", uri_str);
    return;
  }

g_assert_cmpstr (g_uri_get_scheme (uri), ==, "https");
g_assert_cmpstr (g_uri_get_host (uri), ==, "discourse.gnome.org");
g_assert_cmpstr (g_uri_get_path (uri), ==, "/search");
g_assert_cmpstr (g_uri_get_query (uri), ==, "q=search%20terms");
g_assert_cmpstr (g_uri_get_fragment (uri), ==, "ember372");

/* Parse the params further. Using g_uri_parse_params() requires that we pass G_URI_FLAGS_ENCODED_QUERY to g_uri_parse() above, otherwise the %-encoded values could be decoded to create more separators */
query_params = g_uri_parse_params (g_uri_get_query (uri), -1,
                                   "&",
                                   G_URI_PARAMS_NONE,
                                   &local_error);
if (query_params == NULL)
  {
    /* Handle the error */
    g_error ("Invalid query: %s", g_uri_get_query (uri));
    return;
  }

g_assert_cmpstr (g_hash_table_lookup (query_params, "q"), ==, "search terms");

Building a URI is a matter of calling g_uri_build() or g_uri_join(), which should be self-explanatory.

Please try it out! The API is unstable until GLib makes its 2.66.0 stable release (freezing on 2020-08-08), so now is the time to comment on things which don’t make sense or are hard to use.

Heap profiling of gnome-software

The last week has been a fun process of starting to profile gnome-software with the aim of lowering its resource consumption and improving its startup speed. gnome-software is an important part of the desktop, so having it work speedily, especially on resource constrained computers, is important. This work is important for Endless OS, and is happening upstream.

To start with, I’ve looked at gnome-software’s use of heap memory, particularly during startup. While allocating lots of memory on the heap isn’t always a bad thing (caches are a good example of heap allocations being used to speed up a program overall), it’s often a sign of unnecessary work being done. Large heap allocations do take a few tens of milliseconds to be mapped through the allocator too. To do this profiling, I’ve been using valgrind’s massif tool, and massif-visualizer to explore the heap allocations. I could also have used heaptrack, or gobject-list, but they’re tools to explore another time.

Profile your app

Before diving into the process of optimising, the summary is that this work dropped gnome-software’s pixbuf heap usage by 24MB, and its non-pixbuf heap usage after initialisation (i.e. at the point when the main window is visible and ready to use) by 12%, from 15.7MB to 13.7MB (on my set of flatpak repositories on Fedora 32). I’ve been doing this work upstream, and it’ll trickle down to the downstream copy of gnome-software in Endless OS.

There is more low-hanging fruit to explore, and plenty of opportunities to dive in and trim more memory usage from gnome-software, or other apps. If you’re interested, please dive in! Get in touch if you have questions, or post them on GNOME’s Discourse instance and tag me. I’ll be happy to help!

How to profile heap usage

Profiling heap usage using massif is an iterative process: run your program under massif, do some actions in the program, quit, then open the resulting massif.out.pid file in massif-visualizer and see where allocations are coming from. Pick an allocation which looks large or unnecessary, find it in the code, optimise the code (if possible), and then repeat the process.

When running it, I wait for gnome-software to finish loading its main window, then I exit; so all this profiling work is for allocations during startup.

I run massif using this script, which I’ve put in ~/.local/bin/massif:

export G_SLICE=always-malloc
exec valgrind --tool=massif --num-callers=50 --suppressions=/usr/share/glib-2.0/valgrind/glib.supp --trace-children=no --threshold=0.1 --alloc-fn=g_malloc --alloc-fn=g_object_new --alloc-fn=g_malloc0 --alloc-fn=g_malloc0_n --alloc-fn=g_malloc_n --alloc-fn=g_realloc --alloc-fn=g_realloc_n --alloc-fn=g_slice_alloc --alloc-fn=g_slice_alloc0 --alloc-fn=g_type_create_instance --alloc-fn=g_object_new_internal --alloc-fn=g_object_new_with_properties --alloc-fn=g_object_newv --alloc-fn=g_object_new_valist --alloc-fn=g_try_malloc --alloc-fn=g_try_malloc_n --alloc-fn=g_hash_table_realloc_key_or_value_array --alloc-fn=realloc_arrays --alloc-fn=g_hash_table_resize --alloc-fn=g_hash_table_maybe_resize --alloc-fn=g_hash_table_insert_node --alloc-fn=g_hash_table_insert_internal --alloc-fn=g_hash_table_setup_storage "$@"

All the --alloc-fn arguments hide internal GLib functions so that the output is a little easier to interpret directly. There currently isn’t a way to store them in a config file or suppression file.

Some typical output from massif-visualizer before any code improvements:

massif-visualizer output from before code improvements to gnome-software, libxmlb or json-glib. The majority of the allocations are for pixbufs.

The window shows heap allocations against time in instructions executed. The breakdown of where each allocation came from is known in detail at key snapshots (which are expandable in the side pane), and the total heap usage is known in summary for the other snapshots, which allows the graph to be drawn. Allocations coming from different functions are coloured differently in the graph.

There are two sets of allocations to focus on: the red plateau between time 1e+10 and 1.6e+10, and the orange step from time 1e+10 onwards.

Selecting the red plateau shows the backtrace which led to its allocation in the side pane, and (despite some missing debug symbols, leading to the ‘???’ entries), it seems to have come from within libjpeg, as part of loading a JPEG pixbuf. gnome-software has various large JPEG images which are displayed in the featured app banners. It seems that libjpeg makes some big temporary allocations when loading a JPEG.

The orange step from 1e+10 onwards is another target. Looking at the backtraces, it seems it’s a series of similar allocations for pixbuf pixel storage for the featured app banners and for app icons. Some quick calculations show that each 1024×400 pixel banner will take around 6.5MB of memory to store its uncompressed pixels (at 16B per pixel).

From the graph and the backtraces, it seems that almost 100MB is used for pixbuf data for featured app banners. At 6.5MB per banner, that’s 15 banners, which seems reasonable. But actually gnome-software limits itself to 5 banners, so something’s amiss.

Style providers aren’t cheap

After adding some debug prints in GTK where it loads the pixbufs for CSS background properties, it became evident that the same few images were being loaded multiple times. CSS is used to style each featured app tile, including setting the background, since that allows a lot of artistic freedom quite easily. However, the CSS was being refreshed and set a few times for each tile, with a new GtkCssProvider each time. The old provider was staying in place, but with its properties overridden. This included the previously-loaded background image, which remained loaded but unused (essentially, leaked!). With that chased down, it was possible to fix the problem.

Back to profiling

With one issue investigated and fixed, the next step is to do another profiling run, find another target for reducing heap allocations, and repeat.

While we might have fixed one pixbuf bug in gnome-software, it does still use a lot of memory for pixbufs, since it displays a lot of high-resolution app icons. Those pixbuf allocations occupy a lot of space in the massif-visualizer view, and take up a large percentage of the ‘threshold’ of heap allocations which massif includes in its traces.

massif provides the --ignore-fn argument to allow certain allocations to be ignored, so that you can more easily profile others. So I did further profiling runs with a series of --ignore-fn arguments to ignore pixbuf allocations.

massif-visualizer output from before code improvements to gnome-software, libxmlb or json-glib, with pixbuf allocations ignored.

With the --ignore-fn arguments, and increasing the ‘Stacked diagrams’ level in the toolbar to show more individual areas on the graph, it’s now possible to see more detail on the largest non-pixbuf allocations, and hence easier to choose where to focus next.

massif-visualizer output from before code improvements to gnome-software, libxmlb or json-glib, with pixbuf allocations ignored, and more stacked diagrams shown.

From this screenshot, perhaps the next place to focus on would be GHashTable creation and insertions, since that totals around 1MB of the heap usage (once pixbufs are ignored).

Summary

I have iterated through the gnome-software massif profiles a few times, and have submitted various other fixes to gnome-software and libxmlb which are in the process of being reviewed and merged, but I won’t walk through each of them. There are still improvements to be made in future: gnome-software is quite complex!

In total, the changes reduced gnome-software’s heap usage at startup by 26MB, though the actual numbers will vary on other systems depending on how often feature tiles get refreshed, and how many apps and repositories you have configured.

These changes have not made a significant improvement to the startup time of gnome-software, which is more significantly influenced by network activity and file parsing (and the subject of some future work).

Hopefully this post gives a workable introduction to how to use massif on your own software. Please speak up if you have any questions. If you do profiling work on your software, please blog about it — it would be interesting to see what improvements are possible.

g_assert_no_errno() and GLib 2.65.1

It’s the start of a new GLib release cycle, and so it’s time to share what people have been contributing so far. GLib 2.65.1 will be out soon, and it will contain a new test macro, g_assert_no_errno(). This checks that a POSIX-style function (like, say, rmdir()) succeeds when run. If the function fails (and indicates that by returning a negative integer) then g_assert_no_errno() will print out the error message corresponding to the current value of errno.

This should slightly simplify tests which have to use POSIX-style functions which don’t support GError.

Thanks to Simon McVittie for his help in putting it together and getting it tested and merged.

Coding style checks in CI

For a few weeks now, we’ve had coding style and thinko checks running in CI for GLib, long enough to see that it’s working OK (check out this pipeline!) and is perhaps time to share with others.

There are two jobs in the checks, both of which run in a new style-check stage of our CI pipeline, which runs before anything else. One job checks the coding style of a merge request, using clang-format. The other job checks for any lines which introduce TODO comments (or similar). These jobs are intended to be fast, but also to not fail the pipeline if they produce warnings. Coding style is subjective, and nobody has yet come up with a mechanical style description which doesn’t have ugly corner cases. So the intention of these jobs is to help remind people of the coding style, to avoid reviewers having to check coding style manually, and hence to only end up thinking about it or discussing it when the CI says the style is wrong.

The first job, style-check-diff, uses a script to work out the diff from the merge request’s branch point, and then passes that to clang-format-diff.py, a script from LLVM which uses clang-format to reformat only the changed lines in a diff. If any reformatting occurs, that means the merge request differs from the GLib coding style (as described by .clang-format) and should be warned about.

There are some limitations to clang-format: it can’t completely describe how the GLib coding style aligns function arguments. That’s unfortunate, because GNOME Builder aligns function arguments by default (and it’s nice behaviour). Hopefully somebody will implement support in clang-format for this sometime, so that it can accurately describe the coding style which is used by a large number of GNOME projects.

These coding style checks are work by Pavlo Solntsev and Emmanuel Fleury. Thanks!

The second job, check-todos, also uses a script to work out the diff from the merge request’s branch point. It passes that to a little Python program which checks for banned words in the commit message and added lines in the diff. The aim is to prevent any comments containing TODO or WIP from accidentally getting merged, as these strings are often used by developers to indicate something they need to come back to before they’re finished — and it’s easy to miss them!

I’ve been using the convention that comments containing FIXME are OK to be merged, as they indicate something that will need updating in future, but can be merged as-is for now (such as a workaround).

Feedback, improvements, and copying welcome. These scripts should run on any CI image which has git, clang and Python.

Interested in a GUADEC remote attendance party in the UK, July 2020?

tl;dr: Fill out this 5 minute survey if you want to meet up in the UK to attend GUADEC remotely.

GUADEC is in Mexico this year, which is great! This means that, for once, the tables are turned and people in Europe will get to experience what everyone in the rest of the world normally experiences for GUADEC: long travel times. That’s no bad thing, but I suspect it means there’ll be more people from Europe who are taking a break from GUADEC this year.

I don’t want to travel to GUADEC, but do want to keep up with the conference and see people. So I’m looking at organising a UK remote attendance party for GUADEC, where anyone who isn’t going to Mexico is welcome to come along for a few days, follow the conference remotely, hack together, and socialise together.

This would be open to the same audience as GUADEC itself, but limited to people in the UK. It would be great if people in other countries who can’t make it to Mexico also got together to follow the conference, but I don’t want to encourage travel from Europe to the UK to attend this party. This is not (currently) an official GNOME event.

So, to get the ball rolling, here’s a 5 minute survey. Please fill it in if you’re at all interested in any of this. Thanks!

Testing D-Bus clients with libglib-testing

I’ve always found it a bit of a pain to write unit tests for D-Bus client libraries, where you’re testing that your code calls methods on a D-Bus service appropriately and, in particular, correctly handles a variety of return values and errors. Writing unit tests like this traditionally involves writing a mock D-Bus service for them to talk to, which validates the input it receives and provides appropriate responses. That often goes most of the way towards reimplementing the entirety of the real D-Bus service.

Part of the difficulty of testing D-Bus clients like this is synchronising the state of the mock D-Bus service with the test code, and part of the difficulty is the fact that you have to write mock service code for each D-Bus method before you can test it — which is a lot of investment in writing code before you can even start writing your unit tests themselves.

As an experiment in finding a better way of doing this kind of testing, I’ve written GtDBusQueue in libglib-testing, and I think it might be ready for some wider use. Thanks a lot to Endless for allowing me to work on such projects! I’ve used it in a couple of projects now, particularly in libmalcontent (which handles implementing parental controls policy on the desktop, and needs to talk to the accountsservice D-Bus service).

GtDBusQueue basically implements a queue for D-Bus messages received from your D-Bus client code. Each D-Bus message is typically a method call: your unit test can inspect the queue, and will typically pop messages off the front of the queue to assert they match a certain method call, and then send a reply to that call.

A key feature of GtDBusQueue is that it operates as a queue of D-Bus messages, rather than as a collection of D-Bus object proxies (typically GDBusObjectProxy), which means that it can be used to handle method calls to arbitrary D-Bus object paths without having to implement a new proxy class for each of them.

Message matching is typically implemented using gt_dbus_queue_assert_pop_message() (though other methods are available which give you finer-grained control over message matching and removal from the queue). It blocks until the queue is not empty, pops the first message off the front, asserts that its D-Bus object path, interface name and method name are as expected, and then returns the method call parameters to your unit test code using the same syntax as g_variant_get(). Your unit test code can then check the values of those parameters how it pleases.

If your D-Bus client code is asynchronous, GtDBusQueue can be used inline in your unit test. Your client code will start a method call asynchronously, then the test code will pop the method call off the GtDBusQueue, check it and reply, and then your client code will asynchronously finish its method call and handle the results. You can see an example of this in the test_app_filter_bus_get_error_disabled() test in libmalcontent which, in a single function, tests that the mct_manager_get_app_filter_async() API can correctly handle a D-Bus InvalidArgs error returned by the second D-Bus call it makes.

If your D-Bus client code is synchronous, GtDBusQueue needs to run in a thread using gt_dbus_queue_set_server_func(), since otherwise it would block your D-Bus client code. The unit test and the server thread take turns at blocking on pushing messages onto the queue or popping them off. You can see an example of this (which also works for asynchronous client code, testing both the synchronous and asynchronous code paths in a single test) in the GtDBusQueue usage example in its documentation.

That’s a brief introduction to GtDBusQueue; hopefully it’s given you a bit of an idea about where it’s appropriate and how it can be used. There’s documentation in the source code (including some usage examples), and a load of usage examples in libmalcontent. Feedback, questions and improvements are always welcome!

g_get_os_info() and GLib 2.63.1

GLib 2.63.1 has been released. The final new API to mention in this mini-series of blog posts about what’s in 2.63.1 is g_get_os_info().

g_get_os_info() is a way to get identifying information about the OS. On Linux, this is gathered from /etc/os-release. On other OSs, it’s gathered using platform-specific APIs (on other Unixes, this means falling back to uname()).

It was written by Robert Ancell, with contributions from Ruslan Izhbulatov, Ting-Wei Lan and Simon McVittie; and it came out of proposals from Robert at GUADEC.

To use it, pass it a key, like G_OS_INFO_KEY_PRETTY_NAME, and it’ll return a newly-allocated string with the corresponding value for the current OS, or NULL if there was none. Different OSs support different sets of keys, and the amount of support and set of keys may change over time.

An example:

g_autofree gchar *os_name = g_get_os_info (G_OS_INFO_KEY_PRETTY_NAME);
g_print ("OS: %s\n", os_name);
/* Prints “OS: Fedora 30 (Workstation Edition)” for me */

g_array_steal() and g_ptr_array_steal() in GLib 2.63.1

Another set of new APIs in the upcoming GLib 2.63.1 release allow you to steal all the contents of a GArray, GPtrArray or GByteArray, and continue using the array container to add more contents to in future.

This is work by Paolo Bonzini and Emmanuel Fleury, and will be available in the soon-to-be-released 2.63.1 release.

Here’s a quick example using GPtrArray — usage is similar in GArray and GByteArray:

g_autoptr(GPtrArray) chunk_buffer = g_ptr_array_new_with_free_func (g_bytes_unref);

/* Some part of your application appends a number of chunks to the pointer array. */
g_ptr_array_add (chunk_buffer, g_bytes_new_static ("hello", 5));
g_ptr_array_add (chunk_buffer, g_bytes_new_static ("world", 5));

…

/* Periodically, the chunks need to be sent as an array-and-length to some other part of the program. */
GBytes **chunks;
gsize n_chunks;

chunks = g_ptr_array_steal (chunk_buffer, &n_chunks);
for (gsize i = 0; i < n_chunks; i++)
  {
    /* Do something with each chunk here, and then free them, since g_ptr_array_steal() transfers ownership of all the elements and the array to the caller. */
    …

    g_bytes_unref (chunks[i]);
  }

g_free (chunks);

/* After calling g_ptr_array_steal(), the pointer array can be reused for the next set of chunks. */
g_assert (chunk_buffer->len == 0);