Checking JSON files for correctness

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

{
	"title": "Example Schema",
	"type": "object",
	"properties": {
		"firstName": {
			"type": "string"
		},
		"lastName": {
			"type": "string"
		},
		"age": {
			"description": "Age in years",
			"type": "integer",
			"minimum": 0
		}
	},
	"required": ["firstName", "lastName"]
}

Valid instances of this JSON schema are, for example:

{
	"firstName": "John",
	"lastName": "Smith"
}

or:

{
	"firstName": "Jessica",
	"lastName": "Smith",
	"age": 31
}

or even:

{
	"firstName": "Sandy",
	"lastName": "Sanderson",
	"country": "England"
}

The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}

They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])

AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])

Add this to the Makefile.am for your tests:

json_schemas = \
	my-format.schema.json \
	my-other-format.schema.json \
	$(NULL)

EXTRA_DIST += $(json_schemas)

check-json-schema: $(json_schemas)
	$(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema

json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)

%.schema.h: %.schema.json
	$(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
		--c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
		--format c $^ > $@

my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)

And add this to your test suite C file itself:

#include "my-format.schema.h"

…

// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;

  i = GPOINTER_TO_UINT (user_data);

  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);

  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }

  g_clear_error (&error);
  g_clear_object (&parsed);
}

…

int
main (int argc, char *argv[])
{
  guint i;

  …

  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;

      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }

  …
}

Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

G_OBJECT and G_IS_OBJECT allow NULL

tl;dr: G_OBJECT(NULL) evaluates to NULL with no side effects; G_IS_OBJECT(NULL) evaluates to FALSE with no side effects. The same is true for these macros for GObject subclasses.

Someone recently pointed out a commonly-misunderstood point about the GObject casting and type checking macros: they all happily accept NULL, without printing errors or causing assertion failures.

If you call G_OBJECT (NULL), it is a dynamically checked type cast of NULL, and evaluates to NULL, just as if you’d written (GObject *) NULL.

If you call G_IS_OBJECT (NULL), it evaluates to FALSE.

The misconception seems to be that they cause assertion failures. I think that arises from the fact that G_IS_OBJECT is commonly used with g_return_if_fail(), which does cause an assertion failure if G_IS_OBJECT returns FALSE.

Similarly, this all applies to the macros for GObject subclasses, like GTK_WIDGET and GTK_IS_WIDGET, or G_FILE and G_IS_FILE, etc.

Reference: g_type_check_instance_cast() and g_type_check_instance_is_a(), which are what these macros evaluate to.

Hitori available as an xdg-app preview

I’ve been playing a bit with xdg-app recently, and just spent half an hour packaging Hitori as an xdg-app bundle. It was remarkably easy — in fact, it only took 7 commands.

If you want to install it, make sure you’ve got xdg-app installed, then:

# Install http://sdk.gnome.org/keys/gnome-sdk.gpg to /usr/share/ostree/trusted.gpg.d first
# Then set up the runtimes (if you haven’t already)
xdg-app add-remote --user gnome-sdk http://sdk.gnome.org/repo/
xdg-app install-runtime --user gnome-sdk org.gnome.Platform 3.18
xdg-app install-runtime --user gnome-sdk org.freedesktop.Platform 1.2
# Add the repository for Hitori, then install it
xdg-app add-remote --user --no-gpg-verify pwithnall-hitori https://people.collabora.co.uk/~pwith/xdg-app/repos/hitori/
xdg-app install-app --user pwithnall-hitori org.gnome.Hitori
# Play the game!
xdg-app run org.gnome.Hitori

What works?

  • Playing the game
  • Use of X11
  • Icons
  • Desktop file

What doesn’t work (yet)?

  • GSettings access (I may have misconfigured this, because it’s supposed to work)
  • Help manual

I don’t plan to support this repository especially well, since it was just for playing around, but it shows that xdg-app is coming along nicely!

GLib now has a datagram interface

For those who like their I/O packetised, GLib now has a companion for its GIOStream class — the GDatagramBased interface, which we’ve implemented as part of R&D work at Collabora. This is designed to be implemented by any class which does datagram-based I/O. GSocket implements it, essentially as an interface to recvmmsg() and sendmmsg(). The upcoming DTLS support in glib-networking will use it.

How is it shaped? Basically as a GLib-shaped version of recvmmsg() and sendmmsg(), where GInputMessage and GOutputMessage are effectively equivalent to struct mmsghdr, and GInputVector and GOutputVector are equivalent to struct msghdr.

The interface is designed around polling, potentially-blocking I/O. What’s ‘potentially-blocking’ about it? The timeout parameter. Set it to zero for non-blocking behaviour, where the functions will return G_IO_ERROR_WOULD_BLOCK if they would block. Set it negative for blocking behaviour, where the functions will not return until they can do at least some I/O. Set it positive for timeout behaviour, where the functions will block for the given number of microseconds, then return G_IO_ERROR_TIMED_OUT if they managed to perform no I/O.

gint
g_datagram_based_receive_messages (GDatagramBased   *datagram_based,
                                   GInputMessage    *messages,
                                   guint             num_messages,
                                   gint              flags,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

gint
g_datagram_based_send_messages    (GDatagramBased   *datagram_based,
                                   GOutputMessage   *messages,
                                   guint             num_messages,
                                   gint              flags,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

GSource *
g_datagram_based_create_source    (GDatagramBased   *datagram_based,
                                   GIOCondition      condition,
                                   GCancellable     *cancellable);
GIOCondition
g_datagram_based_condition_check  (GDatagramBased   *datagram_based,
                                   GIOCondition      condition);
gboolean
g_datagram_based_condition_wait   (GDatagramBased   *datagram_based,
                                   GIOCondition      condition,
                                   gint64            timeout,
                                   GCancellable     *cancellable,
                                   GError          **error);

Currently, the API (particularly GInputMessage and GOutputMessage, due to the way they are used as in-out parameters) doesn’t support introspection. This can be added in future if needed by creating some convenience API for allocating and freeing the message structures as boxed types.

The grand, overarching plan is for this to appear in a libnice version near you, some time soon, exposing the whole of an ICE connection as a GDatagramBased.

If you think the name is bad, tough. It has been bikeshedded enough already.

gtk-doc knows your package version

tl;dr: gtk-doc will now bind autoconf PACKAGE_* variables for use in your documentation.

For various modules which use gtk-doc, it’s a bit of a rite of passage to copy some build machinery from somewhere to generate a version.xml file which contains your package version, so that you can include it in your generated documentation (“Documenting version X of package Y”).

This is a bit of a pain. It doesn’t have to happen any longer.

gtk-doc master (to become 1.25) now generates a xml/gtkdocentities.ent file containing a few PACKAGE_* variables as generated by autoconf. So, you can now get rid of your version.xml machinery, and change your top-level documentation XML file’s DOCTYPE to the following:

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
   "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
[
 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
 <!ENTITY % gtkdocentities SYSTEM "xml/gtkdocentities.ent">
 %gtkdocentities;
]>

You can now use &package_string;, &package_version;, &package_url;, etc. in your documentation. See gtk-doc.make for a full list.

Thanks to Stefan for getting this feature in!

gnome-common deprecation, round 2

tl;dr: gnome-common is deprecated, but will be hanging around for a while. If you care about modernising your build system, migrate to autoconf-archive macros.

This GNOME release cycle (3.18), we plan to do the last ever release of gnome-common. A lot of its macros for deprecated technologies (scrollkeeper?!) have been removed, and the remainder of its macros have found better replacements in autoconf-archive, where they can be used by everyone, not just GNOME.

We plan to make one last release, and people are welcome to depend on it for as long as they like. However, if you want new hotness, port to the autoconf-archive versions of the macros; but please do it in your own time. There will be no flag day port away from gnome-common.

Note that, for example, porting to AX_COMPILER_FLAGS is valuable, but will probably require fixing a number of new compiler warnings in your code due to increased warning flags. We hope this will make your code better in the long run.

There’s a migration guide here: https://wiki.gnome.org/Projects/GnomeCommon/Migration.

We’ve tried to make the transition as easy and smooth as possible, but there will inevitably be hiccups. Please let me know about anything which breaks or doesn’t make sense, or discuss it on the desktop development list thread. First person to complain about -Wswitch-enum gets a prize.

For developers

When building from a tarball of a module which uses the new macros, you will no longer need gnome-common installed. (Although you may not have needed it before.)

When building from git, you will need m4-common or autoconf-archive installed.

JHBuild bootstrap installs m4-common automatically, as does gnome-continuous; so you don’t need to worry about that.

For packagers

In the 3.14.0 release, gnome-common installed some early versions of the autoconf-archive macros which conflicted with what autoconf-archive itself installs. It now has a --[with|without]-autoconf-archive configure option to control this. We suggest that all packagers pass --with-autoconf-archive if (and only if) autoconf-archive is packaged on the distribution. See bug #747920.

m4-common must not be packaged. See its README. m4-common is essentially a caching subset of autoconf-archive.

For continuous integrators

Modules which use the new AX_COMPILER_FLAGS macro gain a new standard --disable-Werror configure flag, which should be used in CI systems (and any other system where spurious compiler warnings should not cause total failure of a build) to disable -Werror. The idea here is that -Werror is enabled by default when building from git, and disabled by default when building from release tarballs and in buildbots.

For further discussion

See the thread on the desktop development mailing list.

libnice is now mirrored on GitHub

libnice, everyone’s favourite ICE networking library, is now mirrored on GitHub (and GitLab), to make contributing to it easier — just submit a pull request. The canonical git repository is still on freedesktop.org.

Bug tracking is now on phabricator.freedesktop.org, which is where all new bugs should be reported. Existing open bugs have been migrated; existing closed bugs have not, and are archived on bugs.freedesktop.org.

We’ve been slowly working on some interesting changes to how GSockets are handled, so watch out for news on that.

Checking D-Bus API stability

Announcing dbus-deviation, a small tool and set of libraries for automatically checking whether a D-Bus interface has broken API between two releases of a piece of software, developed as part of my work at Collabora.

Why?

If you have a large software project, worked on by multiple developers, it might not be clear when D-Bus interfaces change. For example, they might be pulled in from another repository, or might be accidentally changed without anyone noticing.

Breaks in the D-Bus API of a project (when it’s supposed to be stable) are potentially worse than breaks in its C API, because they can only be detected at runtime — when client applications suddenly error out half-way through an operation because they’ve called a D-Bus method with the wrong argument type. At least with C API breaks, the compiler will catch the break.

(In this respect, I guess D-Bus APIs are actually a form of ABI — a runtime interface, rather than a compile-time interface.)

How?

dbus-deviation provides a utility called dbus-interface-diff, plus some GNU Make glue to plug it into your build system. It only works with git: for each tagged release of your project, it uses git-notes to store copies of all the D-Bus interfaces you care about, in their state at the time of that release. They’re stored as introspection XML; if you have that XML committed into the repository anyway, the git-note becomes a ref to the existing file blob, and takes up virtually no space at all. The dbus-interface-diff tool then does a diff between two XML files (for example, one stored for the most recent release, and the one currently in your working tree), and flags up any forwards- or backwards-incompatibilities.

What?

A backwards-incompatibility, as far as dbus-deviation is concerned, is one where existing clients will not work against new versions of the D-Bus service, for example because a method they use has been removed.

A forwards-incompatibility is one where new versions of clients may not work against old versions of the D-Bus service, for example because they use a method which has been added in a new version of the API.

Traditionally, projects care about preventing backwards-incompatible API changes, and don’t care so much about forwards-incompatibilities. dbus-deviation lets you set your desired stability policy.

Where?

dbus-deviation has a spartan website, a git repository, and bugs are stored using Bugs Everywhere in git; contact me in the comments or by e-mail if you want to report something. API documentation is available for the Python libraries underpinning it, which provide an AST and diff methods for D-Bus APIs.

To get using it, follow the instructions in the README file!

All feedback is very much welcome. One area I feel is still a little awkward is how dbus-deviation integrates with make dist — it forces use of a pre-push git hook to update the remote git-notes for the API signatures of newly pushed release tags. That needs to be set up by each developer who releases a project (using make dist) — any suggestions for improving this are welcome.

What next?

API stability checking for GIR APIs, perhaps? This one needs some more work.

A detailed look at GSource

Another post in this sporadic mini-series on GMainContext, this time looking at GSource and writing your own type of event source. This post is actually adapted from a page I wrote about it for developer.gnome.org, which includes fully-compilable and unit tested source code;  future tweaks and clarifications can be found there. If you find a problem, please file a bug.

tl;dr: Write a custom GSource if you have a non-file-descriptor-based event source to integrate with a GMainContext. It’s a matter of writing a few virtual functions.

What is GSource?

A GSource is an expected event with an associated callback function which will be invoked when that event is received. An event could be a timeout or data being received on a socket, for example.

GLib contains various types of GSource, but also allows applications to define their own, allowing custom events to be integrated into the main loop.

The structure of a GSource and its virtual functions are documented in detail in the GLib API reference.

A message queue source

As a running example, a message queue source will be used which dispatches its callback whenever a message is enqueued to a queue internal to the source (potentially from another thread).

This type of source is useful for efficiently transferring large numbers of messages between main contexts. The alternative is transferring each message as a separate idle GSource using g_source_attach(). For large numbers of messages, this means a lot of allocations and frees of GSources.

Structure

Firstly, a structure for the source needs to be declared. This must contain a GSource as its parent, followed by the private fields for the source: the queue and a function to call to free each message once finished with.

typedef struct {
  GSource         parent;
  GAsyncQueue    *queue;  /* owned */
  GDestroyNotify  destroy_message;
} MessageQueueSource

Prepare function

Next, the prepare function for the source must be defined. This determines whether the source is ready to be dispatched. As this source is using an in-memory queue, this can be determined by checking the queue’s length: if there are elements in the queue, the source can be dispatched to handle them.

return (g_async_queue_length (message_queue_source->queue) > 0);

Check function

As this source has no file descriptors, the prepare and check functions essentially have the same job, so a check function is not needed. Setting the field to NULL in GSourceFuncs bypasses the check function for this source type.

Dispatch function

For this source, the dispatch function is where the complexity lies. It needs to dequeue a message from the queue, then pass that message to the GSource’s callback function. No messages may be queued: even through the prepare function returned true, another source wrapping the same queue may have been dispatched in the mean time and taken the final message from the queue. Further, if no callback has been set for the GSource (which is allowed), the message must be destroyed and silently dropped.

If both a message and callback are set, the callback can be invoked on the message and its return value propagated as the return value of the dispatch function. This is FALSE to destroy the GSource and TRUE to keep it alive, just as for GSourceFunc — these semantics are the same for all dispatch function implementations.

/* Pop a message off the queue. */
message = g_async_queue_try_pop (message_queue_source->queue);

/* If there was no message, bail. */
if (message == NULL)
  {
    /* Keep the source around to handle the next message. */
    return TRUE;
  }

/* @func may be %NULL if no callback was specified.
 * If so, drop the message. */
if (func == NULL)
  {
    if (message_queue_source->destroy_message != NULL)
      {
        message_queue_source->destroy_message (message);
      }

    /* Keep the source around to consume the next message. */
    return TRUE;
  }

return func (message, user_data);

Callback functions

The callback from a GSource does not have to have type GSourceFunc. It can be whatever function type is called in the source’s dispatch function, as long as that type is sufficiently documented.

Normally, g_source_set_callback() is used to set the callback function for a source instance. With its GDestroyNotify, a strong reference can be held to keep an object alive while the source is still alive:

g_source_set_callback (source, callback_func,
                       g_object_ref (object_to_strong_ref),
                       (GDestroyNotify) g_object_unref);

However, GSource has a layer of indirection for retrieving this callback, exposed as g_source_set_callback_indirect(). This allows GObject to set a GClosure as the callback for a source, which allows for sources which are automatically destroyed when an object is finalized — a weak reference, in contrast to the strong reference above:

g_source_set_closure (source,
                      g_cclosure_new_object (callback_func,
                                             object_to_weak_ref));

It also allows for a generic, closure-based ‘dummy’ callback, which can be used when a source needs to exist but no action needs to be performed in its callback:

g_source_set_dummy_callback (source);

Constructor

Finally, the GSourceFuncs definition of the GSource can be written, alongside a construction function. It is typical practice to expose new source types simply as GSources, not as the subtype structure; so the constructor returns a GSource*.

The example constructor here also demonstrates use of a child source to support cancellation conveniently. If the GCancellable is cancelled, the application’s callback will be dispatched and can check for cancellation. (The application code will need to make a pointer to the GCancellable available to its callback, as a field of the callback’s user data set in g_source_set_callback()).

GSource *
message_queue_source_new (GAsyncQueue    *queue,
                          GDestroyNotify  destroy_message,
                          GCancellable   *cancellable)
{
  GSource *source;  /* alias of @message_queue_source */
  MessageQueueSource *message_queue_source;  /* alias of @source */

  g_return_val_if_fail (queue != NULL, NULL);
  g_return_val_if_fail (cancellable == NULL ||
                        G_IS_CANCELLABLE (cancellable), NULL);

  source = g_source_new (&amp;message_queue_source_funcs,
                         sizeof (MessageQueueSource));
  message_queue_source = (MessageQueueSource *) source;

  /* The caller can overwrite this name with something more useful later. */
  g_source_set_name (source, "MessageQueueSource");

  message_queue_source->queue = g_async_queue_ref (queue);
  message_queue_source->destroy_message = destroy_message;

  /* Add a cancellable source. */
  if (cancellable != NULL)
    {
      GSource *cancellable_source;

      cancellable_source = g_cancellable_source_new (cancellable);
      g_source_set_dummy_callback (cancellable_source);
      g_source_add_child_source (source, cancellable_source);
      g_source_unref (cancellable_source);
    }

  return source;
}

Complete example

The complete source code is available in gnome-devel-docs’ git repository, along with unit tests.

Further examples

Sources can be more complex than the example given above. In libnice, a custom GSource is needed to poll a set of sockets which changes dynamically. The implementation is given as ComponentSource in component.c and demonstrates a more complex use of the prepare function.

Another example is a custom source to interface GnuTLS with GLib in its GTlsConnection implementation. GTlsConnectionGnutlsSource synchronizes the main thread and a TLS worker thread which performs the blocking TLS operations.

D-Bus API design guidelines

D-Bus has just gained some upstream guidelines on how to write APIs which best use its features and follow its conventions. Hopefully they draw together all the best practices from the various articles which have been published about writing D-Bus APIs in the past. If anybody has suggestions, criticism or feedback about them, or ideas for extra guidelines to include, please get in touch!