While merrily compiling something a little while ago, my linker threw me this gem of an error message (using GNU gold):
error: libmumble.a(libmumble.o): requires dynamic R_X86_64_PC32 reloc against 'g_strdup' which may overflow at runtime; recompile with -fPIC
or, if you’re using GNU ld (the two linkers have different error messages for the same problem):
error: mumble.o: relocation R_X86_64_PC32 against symbol `g_strdup' can not be used when making a shared object; recompile with -fPIC
I recompiled everything with -fPIC, and magically the problem went away. But I didn’t understand why. I finally got a bit of time to investigate, so here we go.
tl;dr: This is caused by linking a shared library (which requires position-independent code, PIC) to a static library (which has not been compiled with PIC). You need to either link the shared library against a shared version of the static code (such as is produced automatically by libtool), or re-compile the static library with PIC enabled (-fPIC or -fpic).
To understand this, we need a brief introduction to the different types of linking, and how static objects and libraries differ from shared (or dynamic) objects and libraries. Let’s run with a minimal working example: two C files, shared.c and static.c. static.c is compiled to a static archive, libstatic.a (without position-independent code, PIC), and shared.c is compiled to a shared object, libshared.so, which links against libstatic.a.
What is a static object? It’s one where all symbol references are resolved at compile time. What’s a dynamic object? One where symbol references can be resolved at runtime. This means that dynamic objects have to have relocations performed as they’re loaded, which incurs a load-time penalty, but allows for shared libraries and symbol interpositing.
It is these relocations which cause the problem hinted at by the error message above. Each relocation is effectively a note to the runtime loader instructing it to replace a symbol reference in the dynamic object being loaded, with an address calculated at load time.
There are various types of relocations, defined by the platform ABI, as they are specific to the processor’s instruction set. For a more in-depth account of them, see Relocations, Relocations by Michael Guyver. In this case, the R_X86_64_PC32 relocation was chosen by the compiler, which is defined by the AMD64 ABI (Table 4.10). What does that mean? Each relocation type is essentially a mathematical function to define the address of a relocated symbol, given the information in various symbol, section and relocation tables in the dynamic object. The ABI defines R_X86_64_PC32 as \(S+A-P\). Less succinctly, it is the offset of the referenced symbol, plus a constant adjustment (the addend) minus the offset of the relocation. This is all explained brilliantly by Michael Guyver on his blog.
So, with our example, we get the error:
$ make libshared.so
cc -Wall -c -o shared.o shared.c
cc -Wall -c -o static.o static.c
ar rcs libstatic.a static.o
cc -shared -o libshared.so shared.o libstatic.a
/usr/bin/ld: error: shared.o: requires dynamic R_X86_64_PC32 reloc against 'my_static_function' which may overflow at runtime; recompile with -fPIC
collect2: error: ld returned 1 exit status
make: *** [libshared.so] Error 1
If we look at the disassembly of the shared object:
$ objdump -d shared.o
shared.o: file format elf64-x86-64
Disassembly of section .text:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <my_shared_function+0x9>
9: 5d pop %rbp
a: c3 retq
we can see at offset 4 that the callq instruction (calling my_static_function()) leaves 4 bytes for the address of the function to call (actually, callq is instruction-pointer-relative, so the 4 bytes are for the offset of the function from the RIP register).
As the code in libstatic.a is not PIC, it has to be loaded at a fixed offset in a process’ address space. The shared library, libshared.so, must be capable of being loaded anywhere in an address space. This would be fine if the callq instruction could take an absolute address to call, as the linker could substitute in the absolute address of my_static_function() (as is done on 32-bit systems). However, it cannot – it only has 4 bytes of operand to play with, rather than the 8 needed for a 64-bit address – so linking has to fail. And that’s why we get an error which talks about overflow.
What happens if libstatic.a is compiled with PIC enabled? Not a whole lot changes, actually. The disassembly of libstatic.a remains unchanged. shared.o gains a global object table (GOT) section and its relocation for the my_static_function() call changes from R_X86_64_PC32 to R_X86_64_PLT32 — a procedure linkage table (PLT) relocation using the GOT. We can see that in action in the disassembly of the successfully-linked libshared.so (with irrelevant bits omitted):
$ objdump --disassemble libshared.so
libshared.so: file format elf64-x86-64
Disassembly of section .plt:
5f0: ff 25 fa 13 00 00 jmpq *0x13fa(%rip) # 19f0 <_GLOBAL_OFFSET_TABLE_+0x28>
5f6: 68 02 00 00 00 pushq $0x2
5fb: e9 c0 ff ff ff jmpq 5c0 <_init+0x20>
Disassembly of section .text:
6e8: 55 push %rbp
6e9: 48 89 e5 mov %rsp,%rbp
6ec: e8 ff fe ff ff callq 5f0 <my_static_function@plt>
6f1: 5d pop %rbp
6f2: c3 retq
6f3: 90 nop
6f4: 55 push %rbp
6f5: 48 89 e5 mov %rsp,%rbp
6f8: 5d pop %rbp
6f9: c3 retq
6fa: 66 90 xchg %ax,%ax
Firstly, the callq instruction in my_shared_function() has acquired a non-zero operand. This is a constant offset from the instruction pointer at that instruction which references the entry for my_static_function() in the PLT, which we can see as my_static_function@plt in the .plt section. Rather than being the code for the my_static_function(), this is actually a ‘trampoline’ which loads the address of my_static_function() from the GOT, then jumps to it. The GOT is set up by the runtime loader, and allows for the address of my_static_function() to be changed; for example when relocating it, or when interpositing a different version using LD_PRELOAD. By default, the GOT entry for my_static_function() will point to the implementation in the .text section, as linked in from libstatic.a.
This trampolining through a PLT and GOT is the standard solution for producing position independent code, and demonstrates three things:
- Exported functions incur a runtime cost (in the PLT) on every call. This can be eliminated for private symbols, but not (easily) for public ones, as explained by Ian Lance Taylor. This cost is only three instructions; as they change control flow, they could be relatively expensive, but are probably also catered specifically for in modern superscalar 64-bit processors, as the majority of the code they execute will do indirect function calls this way. So the cost can be safely ignored for all but rather specific use cases.
- Position independent code is easy to achieve, and the indirection it requires brings other benefits like the ever-useful LD_PRELOAD, used by developer tools everywhere.
- Marking internal functions as static is important, because ELF exports functions by default, so internal function calls end up being indirected through the PLT if you omit the static modifier. (Though note that none of the functions here could have been marked as such, as they were all in different compilation units.)
So in summary:
- The “requires dynamic R_X86_64_PC32 reloc against ‘mumble’ which may overflow at runtime; recompile with -fPIC” error is caused by attempting to link a shared library against a static object.
- One solution is to compile a position-independent version of the static object. libtool does this automatically, so why aren’t you using libtool?
- Another (highly related) solution is to link against a shared version of the static object.
- This isn’t an issue on 32-bit systems because PIC is possible by default on those systems, since instruction operands are wide enough to contain absolute symbol addresses .
- Compiling with position independent code introduces a procedure linkage table (PLT) and global offset table (GOT) for each object file, which are very hard to eliminate if you want to avoid the (small) function call overhead they introduce.
- So you should avoid PIC if compiling for constrained targets like embedded devices.
- But use it otherwise (e.g. on desktop systems) for the flexibility (the use of shared libraries!) and security (address space layout randomisation) it affords.
Source code for the example here is available on gitorious in the public domain.