Looking at project resource use and CI pipelines in GitLab

While at GUADEC I finished a small script which uses the GitLab API to estimate the resource use of a project on GitLab. It looks at the CI pipeline job durations and artifact storage for the project and its forks over a given period, and totals things.

You might want to run it on your project!

It gives output something like the following:

Between 2022-06-23 00:00:00+00:00 and 2022-07-23 00:00:00+00:00, GNOME/glib and its 20 forks used:

  • 4592 CI jobs, totalling 17125 minutes (duration minimum 0.0, median 2.3, maximum 65.0)
  • Total energy use: 32.54kWh
  • Total artifact storage: 4426 MB (minimum 0.0, median 0.2, maximum 20.9)

This is useful for giving a rough look at the CI resources used by a project, which could be useful for noticing low-hanging fruit for speeding things up or reducing resource waste.

What can I do with this information?

If total pipeline durations are long, either reduce the number of pipeline jobs or speed them up. Speeding them up almost always has no downsides. Reducing the number of jobs is a tradeoff between convenience of development and resource usage. Two ideas for reducing the number of jobs are to make some jobs manual-only, if they are very unlikely to find problems. Or run them on a schedule rather than on every commit, if it’s OK for them to catch problems up to a week after they’re introduced.

If total artifact storage use is high, store fewer artifacts, or expire them after a week (or so). They are likely not so useful after that point anyway.

If artifacts are being used to cache build dependencies, then consider moving those dependencies into a pre-built container image instead. It may be cached better between CI runners.

This script is rubbish, how do I improve it?

Merge requests welcome on https://gitlab.gnome.org/pwithnall/gitlab-stats, or perhaps you’d like to integrate it into cauldron.io so that the data could be visualised over time? The same query code should work for all GitLab instances, not just GNOME’s.

How does it work?

It queries the GitLab API in a few ways, and then applies a very simple model to the results.

It can take a while to run when querying for large projects or for periods of over a couple of weeks, as it needs to make a REST request for each CI job individually.