Reference count debugging with systemtap

I got some really helpful comments on yesterday's post about reference count debugging with gdb which enabled me to get systemtap working.

Getting systemtap working (on Fedora 13)

Install the systemtap-* and kernel-devel packages as per the instructions on the systemtap wiki. Note that the kernel packages need to be for the same version as the kernel you're currently running. I got caught out by this since I hadn't rebooted since I last downloaded an updated kernel package. You then need to add yourself to the stapdev and stapusr groups. Run the command stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}' to test whether everything's installed and working properly. systemtap might ask you to run a make command at this point, which you need to do.

Writing systemtap probes

The probe I'm using to sort out referencing issues is the following, based off the examples Alex Larsson gave when static probes were initially added to GLib and GObject. I've saved it as refs.stp:

global alive
global my_object = "FooObject"

probe gobject.object_new {
	if (type == my_object)
		alive++
}

probe gobject.object_ref {
	if (type == my_object) {
		printf ("%s %p ref (%u)\n", type, object, refcount)
		print_ubacktrace_brief ()
		printf ("\n")
	}
}

probe gobject.object_unref {
	if (type == my_object) {
		printf ("%s %p unref (%u)\n", type, object, old_refcount)
		print_ubacktrace_brief ()
		printf ("\n")

		if (old_refcount == 1)
			alive--
	}
}

probe end {
	printf ("Alive objects: \n")
	if (alive > 0)
		printf ("%d\t%s\n", alive, my_object)
}

This counts how many instances of the FooObject class are created (using a probe on g_object_new()) and destroyed (probing on g_object_unref() and decrementing the alive counter when the last reference is dropped). References and dereferences are also counted, with a short backtrace being outputted for each, which is the key thing I was looking for when debugging reference counting problems.

Using the probes

I was debugging problems in Empathy, so I had to use the following command:

stap refs.stp \
-d ${libdir}/libfolks.so \
-d ${libdir}/libfolks-telepathy.so \
-d ${libdir}/libglib-2.0.so \
-d ${libdir}/libgobject-2.0.so \
-d ${libdir}/libgee.so \
-d ${libdir}/libgtk-x11-2.0.so \
-d ${bindir}/empathy \
-c "${bindir}/empathy"

Each -d option tells systemtap to load unwind data from the given library or executable, which is the key thing I was missing yesterday; these options are necessary for the backtraces to be useful, since systemtap stops unwinding a backtrace at the first frame it can't map to a symbol name. Note that it's necessary to explicitly tell systemtap to load data from the empathy executable, even though it then runs Empathy to trace it.

This gives output like the following when tracing the EmpathyMainWindow object:

EmpathyIndividualStore 0x09c05a10 ref (2)
g_object_ref+0x138
g_value_object_collect_value+0xe0
g_value_set_instance+0x190
.L1016+0x1e0
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

EmpathyIndividualStore 0x09c05a10 unref (2)
g_object_unref+0x13f
g_value_object_free_value+0x2a
g_value_unset+0x6d
.L1041+0x100
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

The only thing I need to do now is to figure out how to script systemtap so that it indents each backtrace nicely according to the reference count of the object.

6 thoughts on “Reference count debugging with systemtap

  1. Frank Ch. Eigler

    How to indent based upon the nearby refcount variable? For the header lines,

    printf ("%-*s %s %p ref (%u)\n", refcount, "", type, object, refcount)

    (see the %-*s part). For indenting the print_*backtrace() ... not quite so easy, since
    each line would need to be manipulated. But something like this could work:

    bt = ubacktrace() # results in a string of hex addresses
    addr = tokenize(bt, " ")
    while (addr != "") {
    printf("%-*s %s\n", refcount, usymdata(strtol(addr,16)))
    addr = tokenize ("", " ")
    }

    which could be plopped into a separate function.

    BTW, if you're not probing the kernel itself, you won't need the kernel-debuginfo package.

  2. Mark Wielaard

    Very cool!

    So with the soon to be released SystemTap 1.3 you get two improvements that should make your life easier:
    - A new --ldd switch, which directs stap to slurp in all libraries that a binary links against, so you can drop the large -d -d -d list and just do: stap --ldd -d "${bindir}/empathy" refs.stp -c "${bindir}/empathy"
    - Some backtrace as string functions (sprint_[u]backtrace() and sprint_[u]stack()) which make the above trick Frank posted above even simpler (no need to convert by hand from hex addresses to symbols/modules, etc).

    I see you are using the print_ubacktrace_brief() variant instead of the print_ubacktrace() variant. Personally I thought the brief variant was a mistake since it was "too brief" and we needed something in between. What kinds of information do you want/need in a backtrace? address, module name, symbol name, offsets, function size, ...? We might be able to provide something even more flexible. The internal systemtap _stp_addr_sprint() function can already provide all or nothing, prefixes, suffixes, etc. But this isn't exposed to the script level (and I admit I don't yet know how to do that naturally, you quickly end up in some printf like syntax which might be overkill).

    1. Philip Withnall Post author

      For debugging reference count problems, usually just the function name is what I need. If systemtap can't figure out the function name, the module name would definitely be useful. Other than that, a line number would be the most useful thing, although I don't know how possible that is.

      The ability to set a maximum number of frames to print out in a backtrace would be useful, since I've found that I usually only need the first eight-or-so frames for this kind of work; any more frames just get in the way. It would be nice if the memory limit for each probe was raised as well (I guess there might currently be an option for it, but I haven't looked into it), since I ran into that when I was testing this out last night — seems that a particularly long backtrace made systemtap unhappy.

      Thanks!

  3. Eero Tamminen

    "The ability to set a maximum number of frames to print out in a backtrace would be useful, since I’ve found that I usually only need the first eight-or-so frames for this kind of work; any more frames just get in the way."

    Fewer frames will also be faster, unwinding a lot of calls can get slow...

  4. iain

    I did something similar by writing a valgrind plugin. The downside was that it was quite verbose, but with some filtering options it worked quite nicely. I meant to write a logfile analyser, but never got round to it and then found something else to do.

Comments are closed.