As I was hacking today, I ran into some hard-to-debug reference counting problems with one of my classes. The normal smattering of printf()s didn't help, and neither did this newfangled systemtap, which was a bit disappointing.
It worked, in that my probes were correctly run and correctly highlighted each reference/dereference of the class I was interested in, but printing a backtrace only extended to the g_object_ref()/g_object_unref() call, and no further. I'm guessing this was a problem with the location of the debug symbols for my code (since it was in a development prefix, whereas systemtap was not), but it might be that systemtap hasn't quite finished userspace stuff yet. That's what I read, at least.
In the end, I ended up using conditional breakpoints in gdb. This was a lot slower than systemtap, but it worked. It's the sort of thing I would've killed to know a few years (or even a few months) ago, so hopefully it's useful for someone (even if it's not the most elegant solution out there).
set pagination off set $foo=0 break main run break g_object_ref condition 2 _object==$foo commands silent bt 8 cont end break g_object_unref condition 3 _object==$foo commands silent bt 8 cont end break my_object_init commands silent set $foo=my_object cont end enable once 4 cont
The breakpoint in main() is to stop gdb discarding our breakpoints out of hand because the relevant libraries haven't been loaded yet. $foo contains the address of the first instance of MyObject in the program; if you need to trace the n+1th instance, use ignore 4 n to only fire the my_object_init breakpoint on the n+1th MyObject instantiation.
This can be extended to track (a fixed number of) multiple instances of the object, by using several $fooi variables and gdb's if statements to set them as appropriate. This is left as an exercise to the reader!
I welcome the inevitable feedback and criticism of this approach. It's hacky, ugly and slower than systemtap, but at least it works.
It's an old technique. The trick is to try to figure out how to choose the breakpoints so that gdb runs as little as possible, since it's slow. If there is a breakpoint with a condition, gdb wakes up every time, checks the condition, and restarts the program if it isn't satisfied, which can be expensive (though it's a lot faster than doing it by hand). Enabling and disabling breakpoints can sometimes help.
"[systemtap] worked, in that my probes were correctly run and correctly highlighted each reference/dereference of the class I was interested in, but printing a backtrace only extended to the g_object_ref()/g_object_unref() call, and no further."
Some major improvements to backtracing are coming soon to stap land. It's possible though that the only thing you were lacking were some '-d /path/to/shlib -d /bin/foo' options to preload unwind data into the systemtap probe module.
That's good to hear; I look forward to being able to use them. Thanks for all the great work on systemtap.
The soon to be released SystemTap 1.3 should at least print the "module" (share library name) of the last frame of the backtrace (plus address of course). That way you can at least see why SystemTap couldn't unwind further. As Frank says then you could provide SystemTap with that shared library hint through -d. Also 1.3 has --ldd which makes SystemTap pick up everything ldd knows about a program. There were also a couple of plain bug fixes in the unwinder that should improve the output.
If that isn't enough maybe we can teach systemtap about pkg-config files to pick up which shared libraries are likely to be used in a gnome program. Or is there some other hints about dynamically loaded libraries that systemtap should know about?
You can compile glib with --enable-debug and then set the g_trap_object_ref variable in gdb. That'll make glib do the breakpoints for you. See http://git.gnome.org/browse/glib/tree/gobject/gobject.c#n2483 for detais.
gdb couldn't find the symbol when I tried that (and I had compiled GLib with --enable-debug=yes). 🙁
Pingback: Reference count debugging with systemtap — drboblog
Maybe I got what you're doing wrong, but I usually set a watchpoint on the object's refcount. Something like:
p object (fugly, but this makes a gdb variable, say $1, with the address of object, so your watchpoint is not tied to the symbol 'object')
watch ((GObject*)$1)->ref_count
Just for completeness - the point of using the watchpoint is that most hardware supports this construct, so it works a lot faster.