What’s the difference between int and size_t?

Passing variable-sized buffers around is a common task in C, especially in networking code. Ignoring GLib convenience API like GBytes, GInputVector and GOutputVector, a buffer is simply an offset and length describing a block of memory. This is just a void* and int, right? What’s the problem?

tl;dr: I’m suggesting to use (uint8_t*, size_t) to describe buffers in C code, or (guint8*, gsize) if you’re using GLib. This article is aimed at those who are still getting to grips with C idioms. See also: (const gchar*) vs. (gchar*) and other memory management stories.

There are two problems here: this effectively describes an array of elements, but void* doesn’t describe the width of each element; and (assuming each element is a byte) an int isn’t wide enough to index every element in memory on modern systems.

But void* could be assumed to refer to an array of bytes, and an int is probably big enough for any reasonable situation, right? Probably, but not quite: a 32-bit signed integer can address 2 GiB (assuming negative indices are ignored) and an unsigned integer can address 4 GiB ((int doesn’t have a guaranteed width (C11 standard, §6.2.5¶5), which is another reason to use size_t. However, on any relevant modern platform it is at least 32 bits.)). Fine for network processing, but not when handling large files.

How is size_t better? It’s defined as being big enough to refer to every addressable byte of memory in the current computer system (caveat: this means it’s architecture-specific and not suitable for direct use in network protocols or file formats). Better yet, it’s unsigned, so negative indices aren’t wasted.

What about the offset of the buffer — the void*? Better to use a uint8_t*, I think. This has two advantages: it explicitly defines the width of each element to be one byte; and it’s distinct from char* (it’s unsigned rather than signed ((Technically, it’s architecture-dependent whether char is signed or unsigned (C11 standard, §6.2.5¶15), and whether it’s actually eight bits wide (though in practice it is really always eight bits wide), which is another reason to use uint8_t.))), which makes it more obvious that the data being handled is not necessarily human-readable or nul-terminated.

Why is it important to define the width of each element? This is a requirement of C: it’s impossible to do pointer arithmetic without knowing the size of an element, and hence the C standard forbids arithmetic on void* pointers, since void doesn’t have a width (C11 standard, §6.2.5¶¶19,1). So if you use void* as your offset type, your code will end up casting to uint8_t* as soon as an offset into the buffer is needed anyway.

A note about GLib: guint8 and uint8_t are equivalent, as are gsize and size_t — so if you’re using GLib, you may want to use those type aliases instead.

For a more detailed explanation with some further arguments, see this nice article about size_t and ptrdiff_t by Karpov Andrey.

4 thoughts on “What’s the difference between int and size_t?

  1. Tom

    @2: If char isn't 8 bit wide, there is no uint8_t defined on the platform. char is guaranteed to have smallest size and sizeof returns size in chars (= bytes). Thus sizeof uint8_t couldn't be defined. If you wan't to be multiplatform, you should use uint_least8_t instead.
    POSIX requires char be 8 bits, but who cares about POSIX. If you write code in C, do it right and in multiplatform way, as much as possible. If it can't be done, document your code why. If you don't, you are a bad programmer, writing wrong code full of bugs.

    1. daniels

      Given that no-one on the planet has a system where unsigned char is not eight bits wide, any code you write to handle that case will be totally untested and thus buggy by definition, as well as a waste of everyone's time.

  2. Benjamin Otte

    There is 1 reason for and 1 reason against using void * as a data type over uint8_t and they both have to do with gcc compiler support.

    gcc can warn you if you do arithmetic on void pointers (not sure what flags you need, and if it's on by default). This is incredibly useful when you have a tendency to not do proper alignment when reading from unaligned data. So if someone does something like *(uint32_t *) (ptr + 25) you'll get an access violation on ARM but nobody tells you about it on x86. However if ptr is a void *, gcc will complain about the addition.

    gcc will however not warn you about casting a void * to or from another pointer type. Its why for example g_object_ref() takes a void pointer as an argument even though it requires a GObject. This is very dangerous when calling functions because you might do conversions that you shouldn't do and the compiler won't tell you. And then you pass the wrong memory address. And then you get very weird program behavior.

    This was just a FYI. Byte arrays should always be uint8_t, because that's what byte arrays are.

Comments are closed.