GNOME Software performance in GNOME 40

tl;dr: Use callgrind to profile CPU-heavy workloads. In some cases, moving heap allocations to the stack helps a lot. GNOME Software startup time has decreased from 25 seconds to 12 seconds (-52%) over the GNOME 40 cycle.

To wrap up the sporadic blog series on the progress made with GNOME Software for GNOME 40, I’d like to look at some further startup time profiling which has happened this cycle.

This profiling has focused on libxmlb, which gnome-software uses extensively to query the appstream data which provides all the information about apps which it shows in the UI. The basic idea behind libxmlb is that it pre-compiles a ‘silo’ of information about an XML file, which is cached until the XML file next changes. The silo format encodes the tree structure of the XML, deduplicating strings and allowing fast traversal without string comparisons or parsing. It is memory mappable, so can be loaded quickly and shared (read-only) between multiple processes. It allows XPath queries to be run against the XML tree, and returns the results.

gnome-software executes a lot of XML queries on startup, as it loads all the information needed to show many apps to the user. It may be possible to eliminate some of these queries – and some earlier work did reduce the number by binding query parameters at runtime to pre-prepared queries – but it seems unlikely that we’ll be able to significantly reduce their number further, so better speed them up instead.

Profiling work which happens on a CPU

The work done in executing an XPath query in libxmlb is largely on the CPU — there isn’t much I/O to do as the compiled XML file is only around 7MB in size (see ~/.cache/gnome-software/appstream), so this time the most appropriate tool to profile it is callgrind. I ruled out using callgrind previously for profiling the startup time of gnome-software because it produces too much data, risks hiding the bigger picture of which parts of application startup were taking the most time, and doesn’t show time spent on I/O. However, when looking at a specific part of startup (XML queries) which are largely CPU-bound, callgrind is ideal.

valgrind --tool=callgrind --collect-systime=msec --trace-children=no gnome-software

It takes about 10 minutes for gnome-software to start up and finish loading the main window when running under callgrind, but eventually it’s shown, the process can be interrupted, and the callgrind log loaded in kcachegrind:

Here I’ve selected the main() function and the callee map for it, which shows a 2D map of all the functions called beneath main(), with the area of each function proportional to the cumulative time spent in that function.

The big yellow boxes are all memset(), which is being called on heap-allocated memory to set it to zero before use. That’s a low hanging fruit to optimise.

In particular, it turns out that the XbStack and XbOperands which libxmlb creates for evaluating each XPath query were being allocated on the heap. With a few changes, they can be allocated on the stack instead, and don’t need to be zero-filled when set up, which saves a lot of time — stack allocation is a simple increment of the stack pointer, whereas heap allocation can involve page mapping, locking, and updates to various metadata structures.

The changes are here, and should benefit every user of libxmlb without further action needed on their part. With those changes in place, the callgrind callee map is a lot less dominated by one function:

There’s still plenty left to go at, though. Contributions are welcome, and we can help you through the process if you’re new to it.

What’s this mean for gnome-software in GNOME 40?

Overall, after all the performance work in the GNOME 40 cycle, startup time has decreased from 25 seconds to 12 seconds (-52%) when starting for the first time since the silo changed. This is the situation in which gnome-software normally starts, as it sits as a background process after that, and the silo is likely to change every day or two.

There are plans to stop gnome-software running as a background process, but we are not there yet. It needs to start up in 1–2 seconds for that to give a good user experience, so there’s a bit more optimisation to do yet!

Aside from performance work, there’s a number of other improvements to gnome-software in GNOME 40, including a new icon, some improvements to parts of the interface, and a lot of bug fixes. But perhaps they should be explored in a separate blog post.

Many thanks to my fellow gnome-software developers – Milan, Phaedrus and Richard – for their efforts this cycle, and my employer the Endless OS Foundation for prioritising working on this.

2 thoughts on “GNOME Software performance in GNOME 40

  1. Jano

    Really cool stuff! I think, the Software Center is one of the few parts where Gnome is lacking (compared as well to KDE as to proprietary OSes and their App Stores) and I really like how you systematically fight the performance bottlenecks! I really look forward to the bugfixing blog article - I can't exactly tell about any specific bugs in Software, but somehow the application always felt a little buggy to me (compared to apt or Discover). Maybe the next blog post will enlighten me!

Comments are closed.