Thoughts about reviewing large patchsets

I have recently been involved in reviewing some large feature patchsets for a project at work, and thought it might be interesting to discuss some of the principles we have been trying to stick to when going about these reviews.

These are just suggestions which will not apply verbatim to every project, but they might provoke some ideas which are relevant to your project. In many cases, a developer might find a checklist is not useful for them — it is too rigid, or slows down the pace of development too much. Checklists are probably best treated as strong suggestions, rather than strict requirements for a review to pass; developers need to use them as reminders for things they should think about when submitting a patch, rather than a box-ticking exercise. Checklists probably work better in larger projects with lots of infrequent or inexperienced contributors, who may not be aware of all of the conventions of the project.

Overall principles

In order to break a set of reviews down into chunks, there a few key principles to stick to:

  • Review patches which are as small as possible, but no smaller (see here, here and here)
  • Learn from each review so the same review comments do not need to be made more than once
  • Use automated tools to eliminate many of the repetitive and time consuming parts of patch review and rework
  • Do high-level API review first, ideally before implementing those APIs, to avoid unnecessary rework

Pre-submission checklist

(A rationale for each of these points is given in the section below to avoid cluttering this one.)

This is the pre-submission checklist we have been using to check patches against before submission for review:

  1. All new code follows the coding guidelines, especially the namespacing guidelines, memory management guidelines, pre- and post-condition guidelines, and introspection guidelines — some key points from these are pulled out below, but these are not the only points to pay attention to.
  2. All new public API must be namespaced correctly.
  3. All new public API must have a complete and useful documentation comment.
  4. All new public API documentation comments must have GObject Introspection annotations where appropriate; g-ir-scanner (part of the build process) must emit no warnings when run with --warn-all --warn-error (which should be set by $(WARN_SCANNERFLAGS) from AX_COMPILER_FLAGS).
  5. All new public methods must have pre- and post-conditions to enforce constraints on the accepted parameter values.
  6. The code must compile without warnings, after ensuring that AX_COMPILER_FLAGS is used and enabled in configure.ac (if it is correctly enabled, compiling the module should fail if there are any compiler warnings) — remember to add $(WARN_CFLAGS), $(WARN_LDFLAGS) and $(WARN_SCANNERFLAGS) to new Makefile.am targets as appropriate.
  7. The introduction documentation comment for each new object must give a usage example for each of the main ways that object is intended to be used.
  8. All new code must be formatted as per the project’s coding guidelines.
  9. If possible, there should be an example program for each new feature or object, which can be used to manually test that functionality — these examples may be submitted in a separate patch from the object implementation, but must be submitted at the same time as the implementation in order to allow review in parallel. Example programs must be usable when installed or uninstalled, so they can be used during development and on production machines.
  10. There must be automated tests (using the GTest framework in GLib) for construction of each new object, and for getting and setting each of its properties.
  11. The code coverage of the automated tests must be checked (using make check-code-coverage and AX_CODE_COVERAGE) before submission, and if it’s possible to add more automated tests (and for them to be reliable) to improve the coverage, this should be done; the final code coverage figure for the object should be mentioned in a comment on the patch review, and it would be helpful to have the lcov reports for the object saved somewhere for analysis as part of the review.
  12. There must be no definite memory leaks reported by Valgrind when running the automated tests under it (using AX_VALGRIND_CHECK and make check-valgrind).
  13. All automated tests must be installed as installed-tests so they can be run as integration tests on a production system.
  14. make distcheck must pass before submission of any patch, especially if it touches the build system.
  15. All new code has been checked to ensure it doesn’t contradict review comments from previous reviews of other patches (i.e. we want to avoid making the same review comments on every submitted patch).
  16. Commit messages must explain why they make the changes they do.

Rationales

  1. Each coding guideline has its own rationale for why it’s useful, and many of them significantly affect the structure of a patch, so are important to get right early on.
  2. Namespacing is important for the correct functioning of a lot of the developer tools (for example, GObject Introspection), and to avoid symbol collisions between libraries — checking it is a very mechanical process which it is best to not have to spend review time on.
  3. Documentation comments are useful to both the reviewer and to end users of the API — for the reviewer, they act as an explanation of why a particular API is necessary, how it is meant to be used, and can provide insight into implementation choices. These are questions which the reviewer would otherwise have to ask in the review, so writing them up lucidly in a documentation comment saves time in the long run.
  4. GObject Introspection annotations are a requirement for the platform’s language bindings (to JavaScript or Python, for example) to work, so must be added at some point. Fixing the error messages from g-ir-scanner is sufficient to ensure that the API can be introspected.
  5. Pre- and post-conditions are a form of assertion in the code, which check for programmer errors at runtime. If they are used consistently throughout the code on every API entry point, they can catch programmer errors much nearer their origin than otherwise, which speeds up debugging both during development of the library, and when end users are using the public APIs. They also act as a loose form of documentation of what each API will allow as its inputs and outputs, which helps review (see the comments about documentation above).
  6. The set of compiler warnings enabled by AX_COMPILER_FLAGS have been chosen to balance false positives against false negatives in detecting bugs in the code. Each compiler warning typically identifies a single bug in the code which would otherwise have to be fixed later in the life of the library — fixing bugs later is always more expensive in terms of debugging time.
  7. Usage examples are another form of documentation (as discussed above), which specifically make it clearer to a reviewer how a particular feature is intended to be used. In writing usage examples, the author of a patch can often notice awkwardnesses in their API design, which can then be fixed before review — this is faster than them being caught in review and sent back for modification.
  8. Well formatted code is a lot easier to read and review than poorly formatted code. It allows the reviewer to think about the function of the code they are reviewing, rather than (for example) which function call a given argument actually applies to, or which block of code a statement is actually part of.
  9. Example programs are a loose form of testing, and also act as usage examples and documentation for the feature (see above). They provide an easy way for the reviewer to test a feature, especially if it affects a UI or has some interactive element; this is very hard to do by simply looking at the code in a patch. Their biggest benefit will be when the feature is modified in future — the example programs can be used to test changes to the feature and ensure that its behaviour changes (or does not) as expected.
  10. For each unit test for a piece of code, the behaviour checked by that unit test can be guaranteed to be unchanged across modifications to the code in future. This prevents regressions (especially if the unit tests for the project are set up to be run automatically on each commit by a continuous integration system). The value of unit tests when initially implementing a feature is in the way they guide API design to be testable in the first place. It is often the case that an API will be written without unit tests, and later someone will try to add unit tests and find that the API is untestable; typically because it relies on internal state which the test harness cannot affect. By that point, the API is stable and cannot be changed to allow testing.
  11. Looking at code coverage reports is a good way to check that unit tests are actually checking what they are expected to check about the code. Code coverage provides a simple, coarse-grained metric of code quality — the quality of untested code is unknown.
  12. Every memory leak is a bug, and hence needs to be fixed at some point. Checking for memory leaks in a code review is a very mechanical, time-consuming process. If memory leaks can be detected automatically, by using valgrind on the unit tests, this reduces the amount of time needed to catch them during review. This is an area where higher code coverage provides immediate benefits. Another way to avoid leaks is to use g_autoptr() to automatically free memory when leaving a control block.
  13. If all automated tests are available, they can be run as part of system-wide integration tests, to check that the project behaviour doesn’t change when other system libraries (its dependencies) are changed. This is one of the motivations behind installed-tests. This is a one-time setup needed for your project, and once it’s set up, does not need to be done for each commit.
  14. make distcheck ensures that a tarball can be created successfully from the code, which entails building it, running all the unit tests, and checking that examples compile.
  15. If each patch is updated to learn from the results of previous patch reviews, the amount of time spent making and explaining repeated patch review comments should be significantly reduced, which saves everyone’s time.
  16. Commit messages are a form of documentation of the changes being made to a project. They should explain the motivation behind the changes, and clarify any design decisions which the author thinks the reviewer might question. If a commit message is inadequate, the reviewer is going to ask questions in the review which could have been avoided otherwise.

GTK+ hackfest 2016

A dozen GNOME hackers invaded the Red Hat office in Toronto last week, to spend four days planning the next year of work on our favourite toolkit, GTK+; and to think about how Flatpak applications can best integrate with the rest of the desktop.

What did we do?

  • Worked out an approach for versioning GTK+ in future, to improve the balance between stability and speed of development. This has turned into a wiki page.
  • I demoed Dunfell and added support for visualising GTasks to it. I don’t know how much time I will have for it in the near future, so help and feedback are welcome.
  • There was a detailed discussion of portals for Flatpak, including lots of use cases, and the basics of a security design were decided which allows the most code reuse while also separating functionality. Simon has written more about this.
  • I missed some of the architectural discussion about the future of GTK+ (including moving some classes around, merging some things and stripping out some outdated things), but I believe Benjamin had useful discussions with people about it.
  • Allan, Philip, Mike and I looked at using hotdoc for developer.gnome.org, and possible layouts for a new version of the site. Christian spent some time thinking about integration of documentation into GNOME Builder.
  • Allison did a lot of blogging, and plotted with Alex to add some devious new GVariant functionality to make everyone’s lives easier when writing parsers — I’ll leave her to blog about it.

Thanks to Collabora for sending me along to take part!

After the hackfest, I spent a few days exploring Toronto, and as a result ended up very sunburned.

DX hackfest 2016 aftermath

The DX hackfest, and FOSDEM, are over. Thanks everyone for coming — and thanks to betacowork, ICAB, the GNOME Foundation, and the various companies who allowed people to come along. Thanks to Collabora for sending me along and sponsoring snacks and dinner one evening.

What did we do?

Instrumenting the GLib main loop with Dunfell

tl;dr: Visualise your main context and sources using Dunfell. Feedback and ideas welcome.

At the DX hackfest, I’ve been working on a new tool for instrumenting and visualising the behaviour of the GLib main context (or main contexts) in your program.

Screenshot from 2016-01-29 11-17-35

It’s called Dunfell (because I’m a sucker for hills) and at a high level it works by using SystemTap to record various GMainContext interactions in your program, saving them to a log file. The log file can then be examined using a viewer program.

The source is available on GitLab or GitHub because I still haven’t decided which is better.

In the screenshot above, each vertical line is a thread, each blue box is one dispatch phase of the main context which is currently running on that thread, each orange blob is a new GSource being created, and the green blob is a GSource which has been selected for closer inspection.

At the moment, it requires a couple of GLib patches to add some more SystemTap probe points, and it also requires a recent version of GTK+. It needs SystemTap, and I’ve only tested it on Fedora, so it might need some patching to work with the SystemTap installed on other distributions.

Screenshot from 2016-01-29 11-57-39

This screenshot is of a trace of the buffered-input-stream test from GIO, showing I/O callbacks being made across threads as idle source callbacks.

More visualisation ideas are welcome! At the moment, what Dunfell draws is quite simplistic. I hope it will be able to solve various common debugging problems eventually but suggestions for ways to do this intuitively, or for other problems to visualise, are welcome. Here are the use cases I was initially thinking about (from the README):

  • Detect GSources which are never added to a GMainContext.
  • Detect GSources which are dispatched too often (i.e. every main context iteration).
  • Detect GSources whose dispatch function takes too long (and hence blocks the main context).
  • Detect GSources which are never removed from their GMainContext after being dispatched (but which are never dispatched again).
  • Detect GMainContexts which have GSources attached or (especially) events pending, but which aren’t being iterated.
  • Monitor the load on each GMainContext, such as how many GSources it has attached, and how many events are processed each iteration.
  • Monitor ongoing asynchronous calls and GTasks, giving insight into their nesting and dependencies.
  • Monitor unfinished or stalled asynchronous calls.
  • Allow users to record logs to send to the developers for debugging on a different machine. The users may have to install additional software to record these logs (some component of Dunfell, plus its dependencies), but should not have to recompile or otherwise modify the program being debugged.
  • Work with programs which purely use GLib, through to programs which use GLib, GIO and GTK+.
  • Allow visualisation of this data, both in a standalone program, and in an IDE such as GNOME Builder.
  • Allow visualising differences between two traces.
  • Minimise runtime overhead of logging a program, to reduce the risk of disturbing race conditions by enabling logging.
  • Connecting to an already-running program is not a requirement, since by the time you’ve decided there’s a problem with a program, it’s already in the wrong state.

DX hackfest: 2016 edition

By this time tomorrow, the 2016 edition of the GNOME developer experience hackfest will have started. This year, it’s in Brussels, kindly hosted by betacowork and ICAB.

betacowork-coworking-brussels-logo-web logo_white

We will be spending 3 days looking at a variety of things on the agenda to improve the lives of developers on GNOME, and make plans for the rest of the year. Watch out for updates on planet.gnome.org.

Thanks to the GNOME Foundation for sponsoring the travel for various people who are coming.

GnomeLogoHorizontal.svg

Collabora is sponsoring snacks throughout, and is sending 5 of us along for the hackfest. Thank you also to the other companies who are sending or letting people come — I know of Red Hat, Endless Mobile, Codethink and Canonical (please let me know if I’ve forgotten anyone!).

Codethink logo200px-Canonical_logo.svgRedHat.svgsplash-endless-mark-transCollabora logo

See people at FOSDEM afterwards?

Checking JSON files for correctness

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

{
	"title": "Example Schema",
	"type": "object",
	"properties": {
		"firstName": {
			"type": "string"
		},
		"lastName": {
			"type": "string"
		},
		"age": {
			"description": "Age in years",
			"type": "integer",
			"minimum": 0
		}
	},
	"required": ["firstName", "lastName"]
}

Valid instances of this JSON schema are, for example:

{
	"firstName": "John",
	"lastName": "Smith"
}

or:

{
	"firstName": "Jessica",
	"lastName": "Smith",
	"age": 31
}

or even:

{
	"firstName": "Sandy",
	"lastName": "Sanderson",
	"country": "England"
}

The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}

They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])

AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])

Add this to the Makefile.am for your tests:

json_schemas = \
	my-format.schema.json \
	my-other-format.schema.json \
	$(NULL)

EXTRA_DIST += $(json_schemas)

check-json-schema: $(json_schemas)
	$(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema

json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)

%.schema.h: %.schema.json
	$(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
		--c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
		--format c $^ > $@

my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)

And add this to your test suite C file itself:

#include "my-format.schema.h"

…

// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;

  i = GPOINTER_TO_UINT (user_data);

  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);

  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }

  g_clear_error (&error);
  g_clear_object (&parsed);
}

…

int
main (int argc, char *argv[])
{
  guint i;

  …

  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;

      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }

  …
}

Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

G_OBJECT and G_IS_OBJECT allow NULL

tl;dr: G_OBJECT(NULL) evaluates to NULL with no side effects; G_IS_OBJECT(NULL) evaluates to FALSE with no side effects. The same is true for these macros for GObject subclasses.

Someone recently pointed out a commonly-misunderstood point about the GObject casting and type checking macros: they all happily accept NULL, without printing errors or causing assertion failures.

If you call G_OBJECT (NULL), it is a dynamically checked type cast of NULL, and evaluates to NULL, just as if you’d written (GObject *) NULL.

If you call G_IS_OBJECT (NULL), it evaluates to FALSE.

The misconception seems to be that they cause assertion failures. I think that arises from the fact that G_IS_OBJECT is commonly used with g_return_if_fail(), which does cause an assertion failure if G_IS_OBJECT returns FALSE.

Similarly, this all applies to the macros for GObject subclasses, like GTK_WIDGET and GTK_IS_WIDGET, or G_FILE and G_IS_FILE, etc.

Reference: g_type_check_instance_cast() and g_type_check_instance_is_a(), which are what these macros evaluate to.

Hitori available as an xdg-app preview

I’ve been playing a bit with xdg-app recently, and just spent half an hour packaging Hitori as an xdg-app bundle. It was remarkably easy — in fact, it only took 7 commands.

If you want to install it, make sure you’ve got xdg-app installed, then:

# Install http://sdk.gnome.org/keys/gnome-sdk.gpg to /usr/share/ostree/trusted.gpg.d first
# Then set up the runtimes (if you haven’t already)
xdg-app add-remote --user gnome-sdk http://sdk.gnome.org/repo/
xdg-app install-runtime --user gnome-sdk org.gnome.Platform 3.18
xdg-app install-runtime --user gnome-sdk org.freedesktop.Platform 1.2
# Add the repository for Hitori, then install it
xdg-app add-remote --user --no-gpg-verify pwithnall-hitori https://people.collabora.co.uk/~pwith/xdg-app/repos/hitori/
xdg-app install-app --user pwithnall-hitori org.gnome.Hitori
# Play the game!
xdg-app run org.gnome.Hitori

What works?

  • Playing the game
  • Use of X11
  • Icons
  • Desktop file

What doesn’t work (yet)?

  • GSettings access (I may have misconfigured this, because it’s supposed to work)
  • Help manual

I don’t plan to support this repository especially well, since it was just for playing around, but it shows that xdg-app is coming along nicely!

GLib now has a datagram interface

For those who like their I/O packetised, GLib now has a companion for its GIOStream class — the GDatagramBased interface, which we’ve implemented as part of R&D work at Collabora. This is designed to be implemented by any class which does datagram-based I/O. GSocket implements it, essentially as an interface to recvmmsg() and sendmmsg(). The upcoming DTLS support in glib-networking will use it.

How is it shaped? Basically as a GLib-shaped version of recvmmsg() and sendmmsg(), where GInputMessage and GOutputMessage are effectively equivalent to struct mmsghdr, and GInputVector and GOutputVector are equivalent to struct msghdr.

The interface is designed around polling, potentially-blocking I/O. What’s ‘potentially-blocking’ about it? The timeout parameter. Set it to zero for non-blocking behaviour, where the functions will return G_IO_ERROR_WOULD_BLOCK if they would block. Set it negative for blocking behaviour, where the functions will not return until they can do at least some I/O. Set it positive for timeout behaviour, where the functions will block for the given number of microseconds, then return G_IO_ERROR_TIMED_OUT if they managed to perform no I/O.

gint
g_datagram_based_receive_messages (GDatagramBased   *datagram_based,
                                   GInputMessage    *messages,
                                   guint             num_messages,
                                   gint              flags,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

gint
g_datagram_based_send_messages    (GDatagramBased   *datagram_based,
                                   GOutputMessage   *messages,
                                   guint             num_messages,
                                   gint              flags,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

GSource *
g_datagram_based_create_source    (GDatagramBased   *datagram_based,
                                   GIOCondition      condition,
                                   GCancellable     *cancellable);
GIOCondition
g_datagram_based_condition_check  (GDatagramBased   *datagram_based,
                                   GIOCondition      condition);
gboolean
g_datagram_based_condition_wait   (GDatagramBased   *datagram_based,
                                   GIOCondition      condition,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

Currently, the API (particularly GInputMessage and GOutputMessage, due to the way they are used as in-out parameters) doesn’t support introspection. This can be added in future if needed by creating some convenience API for allocating and freeing the message structures as boxed types.

The grand, overarching plan is for this to appear in a libnice version near you, some time soon, exposing the whole of an ICE connection as a GDatagramBased.

If you think the name is bad, tough. It has been bikeshedded enough already.

gtk-doc knows your package version

tl;dr: gtk-doc will now bind autoconf PACKAGE_* variables for use in your documentation.

For various modules which use gtk-doc, it’s a bit of a rite of passage to copy some build machinery from somewhere to generate a version.xml file which contains your package version, so that you can include it in your generated documentation (“Documenting version X of package Y”).

This is a bit of a pain. It doesn’t have to happen any longer.

gtk-doc master (to become 1.25) now generates a xml/gtkdocentities.ent file containing a few PACKAGE_* variables as generated by autoconf. So, you can now get rid of your version.xml machinery, and change your top-level documentation XML file’s DOCTYPE to the following:

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
   "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
[
 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
 <!ENTITY % gtkdocentities SYSTEM "xml/gtkdocentities.ent">
 %gtkdocentities;
]>

You can now use &package_string;, &package_version;, &package_url;, etc. in your documentation. See gtk-doc.make for a full list.

Thanks to Stefan for getting this feature in!