Tag Archives: JSON

Checking JSON files for correctness

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

{
	"title": "Example Schema",
	"type": "object",
	"properties": {
		"firstName": {
			"type": "string"
		},
		"lastName": {
			"type": "string"
		},
		"age": {
			"description": "Age in years",
			"type": "integer",
			"minimum": 0
		}
	},
	"required": ["firstName", "lastName"]
}

Valid instances of this JSON schema are, for example:

{
	"firstName": "John",
	"lastName": "Smith"
}

or:

{
	"firstName": "Jessica",
	"lastName": "Smith",
	"age": 31
}

or even:

{
	"firstName": "Sandy",
	"lastName": "Sanderson",
	"country": "England"
}

The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}

They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])

AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])

Add this to the Makefile.am for your tests:

json_schemas = \
	my-format.schema.json \
	my-other-format.schema.json \
	$(NULL)

EXTRA_DIST += $(json_schemas)

check-json-schema: $(json_schemas)
	$(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema

json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)

%.schema.h: %.schema.json
	$(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
		--c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
		--format c $^ > $@

my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)

And add this to your test suite C file itself:

#include "my-format.schema.h"

…

// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;

  i = GPOINTER_TO_UINT (user_data);

  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);

  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }

  g_clear_error (&error);
  g_clear_object (&parsed);
}

…

int
main (int argc, char *argv[])
{
  guint i;

  …

  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;

      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }

  …
}

Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

DX hackfest 2015: day 1

It’s a sunny Sunday here in Cambridge, UK, and GNOMErs have been arriving from far and wide for the first day of the 2015 developer experience hackfest. This is a week-long event, co-hosted with the winter docs hackfest (which Kat has promised to blog about!) in the Collabora offices.

Today was a bit of a slow start, since people were still arriving throughout the day. Regardless, there have been various discussions, with Ryan, Emmanuele and Christian discussing performance improvements in GLib, Christian and Allan plotting various different approaches to new UI in Builder, Cosimo and Carlos silently plugging away at GTK+, and Emmanuele muttering something about GProperty now and then.

Tomorrow, I hope we can flesh out some of these initial discussions a bit more and get some roadmapping down for GLib development for the next year, amongst other things. I am certain that Builder will feature heavily in discussions too, and apps and sandboxing, now that Alex has arrived.

I’ve spent a little time finishing off and releasing Walbottle, a small library and set of utilities I’ve been working on to implement JSON Schema, which is the equivalent of XML Schema or RELAX-NG, but for JSON files. It allows you to validate JSON instances against a schema, to validate schemas themselves and, unusually, to automatically generate parser unit tests from a schema. That way, you can automatically test json-glib–based JsonReader/JsonParser code, just by passing the JSON schema to Walbottle’s json-schema-generate utility.

It’s still a young project, but should be complete enough to be useful in testing JSON code. Please let me know of any bugs or missing features!

Tomorrow, I plan to dive back in to static analysis of GObject code with Tartan