Tag Archives: debugging

Tip for debugging refcounting issues: change ref calls to copy

Over the last couple of days I’ve been looking at a refcounting issue in GLib’s D-Bus implementation.

As with many things in GLib, the allocations in this code use refcounting, rather than new/free, to manage object lifecycles. This makes it quite hard to debug lifecycle issues, because the location of a bug (a ref or unref call which isn’t correct) can be quite far removed (in time and code) from where the effects of that bug become visible. This is because the effects of refcounting problems only become visible when an object’s refcount reaches zero, or when the program ends and its refcount still hasn’t reached zero.

While debugging this code, I tried an approach I haven’t before: changing some of the ref calls on the buggy object to be copy calls instead. (Specifically, changing g_object_ref() to g_dbus_message_copy().) That split up the lifecycle of the object into smaller pieces, narrowing down the sets of ref/unref calls which could be buggy. Ultimately, this allowed me to find some bugs in the code, and hopefully those are the bugs causing the refcounting issue. Since the issue is intermittent, it’s a bit hard to be sure.

This debugging approach was possible in this case because the object I was debugging is immutable, so passing around copies of it doesn’t affect the behaviour of other bits of code vs passing around the original. Hence this approach is only applicable in some situations. But it’s another good reason why using immutable objects is quite helpful when writing code, and it’s certainly an approach I’m going to be using again when I can.

Runtime control of debug output: Endless Orange Week

Recently at Endless we had a week of focused working on projects which are not our day-to-day work. It was called ‘Endless Orange Week’, and everyone was encouraged to explore a topic of their choosing.

I chose to look at two projects, both of which included a D-Bus/API component. My thinking was that review of the new interfaces on each project might take a while, so it would make sense to have two projects running in parallel so I could switch between them when blocked.

I’m going to blog about the two projects separately, to avoid one mega-long post.

The first project was to add a D-Bus debug interface for applications. This would allow debug output from an application to be turned on and off at runtime, rather than just being set with a command line argument or environment variable when the application is first started.

This would allow users and developers to get debug output from long-running applications without having to restart them, as quite often restarting a process will destroy the state you were hoping to debug.

What I came up with is GDebugController, which is awaiting review in GLib now. It’s an interface, currently implemented by GDebugControllerDBus. When instantiated, GDebugControllerDBus creates a new D-Bus object and interface for controlling the debug output from the application. It hooks into the standard g_debug() message functions, and can be hooked into a custom log writer function if your application uses one of those.

It essentially exists to expose one D-Bus property and allow that to be hooked in to your log writer. It has to be a bit more complex than that, though, as it needs to be able to handle authorisation: checking that the D-Bus peer who’s requesting to enable debug output on your application is actually allowed to do so. For services in particular, this is important, as allowing any peer to enable debug output could count as a privilege escalation. They might be able to slow your process down due to the volume of debug output it produces; fill the disk up; or look at the outputted logs and see private information. GDebugControllerDBus has an authorize signal to support this, and it works quite similarly to the GDBusInterfaceSkeleton::g-authorize-method signal.

Using it in an application

Firstly, you need to wait for it to be reviewed and land in GLib. The API might change during review.

Once it’s landed, assuming nothing changes, you just need to create an instance of GDebugControllerDBus. It will create the D-Bus object and hook it all up. When peers change the debug level in your application, the default handler will call g_log_set_debug_enabled() which will change the behaviour of GLib’s default log writer function.

If you have a custom log writer function, you will need to change it to check g_log_get_debug_enabled() and output debug messages if it’s true.

Using it in a service

Using it in a service will typically involve hooking up authorisation. I’ve implemented support for it in libgsystemservice, so that it will be enabled for any user of libgsystemservice after version 0.2.0.

To use polkit for authorisation, set the GssService:debug-controller-action-id property to the ID of the polkit action you want to use for authorising enabling/disabling debug mode. libgsystemservice will handle the polkit checks using that. Here’s an example.

If that property is not set, a default policy will be used, where debug requests will be accepted unconditionally if your service is running on the session bus, and rejected unconditionally if it’s running on the system bus. The thinking is that there’s no security boundary on the session bus (all peers are equally trusted), whereas there are a lot of security boundaries on the system bus so libgsystemservice is best to fail closed and force you to write your own security policy.

That’s it! Reviews and feedback welcome. Many thanks to Endless for running this week and actively encouraging everyone to make use of it.

Debugging critical warnings from GLib code

tl;dr: G_DEBUG=fatal-warnings gdb ./my-program

If you have some code which uses GLib, and it emits a critical warning, for example if a g_return_if_fail() check fails or if a g_warning() message is emitted, how do you track it down and debug it?

Run your code under gdb with G_DEBUG=fatal-warnings (for g_return_if_fail() and g_warning()) or G_DEBUG=fatal-criticals (for g_return_if_fail()), and gdb will break execution when the failing precondition or warning is reached. If there are multiple warnings and you want to skip through to get to a particular one, just use the continue command in gdb until you reach the one you want.

Where are messages on the terminal coming from?

From a discussion on #gtk+ this morning: if you’re using recent versions of GLib with structured logging support, and you want to work out which bit of your code is causing a certain message to be printed to the terminal, run your application in gdb and add a breakpoint on g_log_writer_standard_streams.

(This assumes you’re using the default log writer function; if not, you need to add a breakpoint on something in your writer function.)

Reference count debugging with systemtap

I got some really helpful comments on yesterday's post about reference count debugging with gdb which enabled me to get systemtap working.

Getting systemtap working (on Fedora 13)

Install the systemtap-* and kernel-devel packages as per the instructions on the systemtap wiki. Note that the kernel packages need to be for the same version as the kernel you're currently running. I got caught out by this since I hadn't rebooted since I last downloaded an updated kernel package. You then need to add yourself to the stapdev and stapusr groups. Run the command stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}' to test whether everything's installed and working properly. systemtap might ask you to run a make command at this point, which you need to do.

Writing systemtap probes

The probe I'm using to sort out referencing issues is the following, based off the examples Alex Larsson gave when static probes were initially added to GLib and GObject. I've saved it as refs.stp:

global alive
global my_object = "FooObject"

probe gobject.object_new {
	if (type == my_object)
		alive++
}

probe gobject.object_ref {
	if (type == my_object) {
		printf ("%s %p ref (%u)\n", type, object, refcount)
		print_ubacktrace_brief ()
		printf ("\n")
	}
}

probe gobject.object_unref {
	if (type == my_object) {
		printf ("%s %p unref (%u)\n", type, object, old_refcount)
		print_ubacktrace_brief ()
		printf ("\n")

		if (old_refcount == 1)
			alive--
	}
}

probe end {
	printf ("Alive objects: \n")
	if (alive > 0)
		printf ("%d\t%s\n", alive, my_object)
}

This counts how many instances of the FooObject class are created (using a probe on g_object_new()) and destroyed (probing on g_object_unref() and decrementing the alive counter when the last reference is dropped). References and dereferences are also counted, with a short backtrace being outputted for each, which is the key thing I was looking for when debugging reference counting problems.

Using the probes

I was debugging problems in Empathy, so I had to use the following command:

stap refs.stp \
-d ${libdir}/libfolks.so \
-d ${libdir}/libfolks-telepathy.so \
-d ${libdir}/libglib-2.0.so \
-d ${libdir}/libgobject-2.0.so \
-d ${libdir}/libgee.so \
-d ${libdir}/libgtk-x11-2.0.so \
-d ${bindir}/empathy \
-c "${bindir}/empathy"

Each -d option tells systemtap to load unwind data from the given library or executable, which is the key thing I was missing yesterday; these options are necessary for the backtraces to be useful, since systemtap stops unwinding a backtrace at the first frame it can't map to a symbol name. Note that it's necessary to explicitly tell systemtap to load data from the empathy executable, even though it then runs Empathy to trace it.

This gives output like the following when tracing the EmpathyMainWindow object:

EmpathyIndividualStore 0x09c05a10 ref (2)
g_object_ref+0x138
g_value_object_collect_value+0xe0
g_value_set_instance+0x190
.L1016+0x1e0
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

EmpathyIndividualStore 0x09c05a10 unref (2)
g_object_unref+0x13f
g_value_object_free_value+0x2a
g_value_unset+0x6d
.L1041+0x100
g_signal_emit_by_name+0x165
gtk_tree_sortable_sort_column_changed+0x78
gtk_tree_store_set_sort_column_id+0xde
gtk_tree_sortable_set_sort_column_id+0xe6
empathy_individual_store_set_sort_criterium+0x108
individual_store_setup+0x162
empathy_individual_store_init+0xb0
g_type_create_instance+0x1c3
g_object_constructor+0x1d
g_object_newv+0x438
.L345+0xfd
g_object_new+0x8d
empathy_individual_store_new+0xb6
empathy_main_window_init+0x890
g_type_create_instance+0x1c3
g_object_constructor+0x1d
empathy_main_window_constructor+0x4c

The only thing I need to do now is to figure out how to script systemtap so that it indents each backtrace nicely according to the reference count of the object.

Reference count debugging with gdb

As I was hacking today, I ran into some hard-to-debug reference counting problems with one of my classes. The normal smattering of printf()s didn't help, and neither did this newfangled systemtap, which was a bit disappointing.

It worked, in that my probes were correctly run and correctly highlighted each reference/dereference of the class I was interested in, but printing a backtrace only extended to the g_object_ref()/g_object_unref() call, and no further. I'm guessing this was a problem with the location of the debug symbols for my code (since it was in a development prefix, whereas systemtap was not), but it might be that systemtap hasn't quite finished userspace stuff yet. That's what I read, at least.

In the end, I ended up using conditional breakpoints in gdb. This was a lot slower than systemtap, but it worked. It's the sort of thing I would've killed to know a few years (or even a few months) ago, so hopefully it's useful for someone (even if it's not the most elegant solution out there).

set pagination off
set $foo=0
break main
run

break g_object_ref
condition 2 _object==$foo
commands
	silent
	bt 8
	cont
	end

break g_object_unref
condition 3 _object==$foo
commands
	silent
	bt 8
	cont
	end

break my_object_init
commands
	silent
	set $foo=my_object
	cont
	end
enable once 4
cont

The breakpoint in main() is to stop gdb discarding our breakpoints out of hand because the relevant libraries haven't been loaded yet. $foo contains the address of the first instance of MyObject in the program; if you need to trace the n+1th instance, use ignore 4 n to only fire the my_object_init breakpoint on the n+1th MyObject instantiation.

This can be extended to track (a fixed number of) multiple instances of the object, by using several $fooi variables and gdb's if statements to set them as appropriate. This is left as an exercise to the reader!

I welcome the inevitable feedback and criticism of this approach. It's hacky, ugly and slower than systemtap, but at least it works.