Controlling safety vs speed when writing files

GLib 2.65.1 has been released with a new g_file_set_contents_full() API which you should consider using instead of g_file_set_contents() for writing out a file — it’s a drop-in replacement. It provides two additional arguments, one to control the trade-off between safety and performance when writing the file, and one to set the file’s mode (permissions).

What’s wrong with g_file_set_contents()?

g_file_set_contents() has worked fine for many years (and will continue to do so). However, it doesn’t provide much flexibility. When writing a file out on Linux there are various ways to do it, some slower but safer — and some faster, but less safe, in the sense that if your program or the system crashes part-way through writing the file, the file might be left in an indeterminate state. It might be garbled, missing, empty, or contain only the old contents.

g_file_set_contents() chose a fairly safe (but not the fastest) approach to writing out files: write the new contents to a temporary file, fsync() it, and then atomically rename() the temporary file over the top of the old file. This approach means that other processes only ever see the old file contents or the new file contents (but not the partially-written new file contents); and it means that if there’s a crash, either the old file will exist or the new file will exist. However, it doesn’t guarantee that the new file will be safely stored on disk by the time g_file_set_contents() returns. It also has fewer guarantees if the old file didn’t exist (i.e. if the file is being written out for the first time).

In most situations, this is the right compromise. But not in all of them — so that’s why g_file_set_contents_full() now exists, to let the caller choose their own compromise.

Choose your own tradeoff

The level of safety/speed of g_file_set_contents() can be chosen using GFileSetContentsFlags.

Situations where your code might want a looser set of guarantees from the defaults might be when writing out cache files (where it typically doesn’t matter if they’re lost or corrupted), or when writing out large numbers of files where you’re going to call fsync() once after the whole lot (rather than once per file).

In these situations, you might choose G_FILE_SET_CONTENTS_NONE.

Conversely, your code might want a tighter set of guarantees when writing out files which are well-formed-but-incorrect when empty or filled with zeroes (as filling a file with zeroes is one of the failure modes of the existing g_file_set_contents() defaults, if the file is being created), or when writing valuable user data.

In these situations, you might choose G_FILE_SET_CONTENTS_CONSISTENT | G_FILE_SET_CONTENTS_DURABLE.

The default flags used by g_file_set_contents() are G_FILE_SET_CONTENTS_CONSISTENT | G_FILE_SET_CONTENTS_ONLY_EXISTING, which makes its definition:

gboolean
g_file_set_contents (const gchar  *filename,
                     const gchar  *contents,
                     gssize        length,
                     GError      **error)
{
  return g_file_set_contents_full (filename, contents, length,
                                   G_FILE_SET_CONTENTS_CONSISTENT |
                                   G_FILE_SET_CONTENTS_ONLY_EXISTING,
                                   0666, error);
}

Check your code

So, maybe now is the time to quickly grep your code for g_file_set_contents() calls, and see whether the default tradeoff is the right one in all the places you call it?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.