Using GDB to fix software-related issues


It is said only developers, testers and debuggers are used to using GDB. That’s absolutely  false. It is quite evident any IT expert can take advantage of this tool so as to ascertain why some software issues come to happen. Using gdb, strace, ltrace and the sort brings an IT technician the ability of tracing a failure down to the very bottom of it. Those old dark times when a failure segfault a program have come to an end. Let’s see right away a real case to demonstrate how we can use gdb to solve this kind of technical problems.

The issue: A segfault whenever opening OpenOffice

Trying to run the OpenOffice suite on an OpenSuse 64 bits GNU/Linux box, an awful segmentation fault appeared right after the initial splash screen. This came to happen right after migrating this very computer to another user, and after having some issues related to the fglrx ATI driver. The computer was running fine, but without this very proprietary VGA driver, using the readeonhd open-source community driver instead.

Using gdb

In order to determine the real problem affecting the OpenOffice suite, the best way to accomplish something of this calibre is by using the debugger, GDB. First of all, we had to know the “soffice” command is just a wrapper, and it is written entirely in Bash (it is a shell-script). We cannot debug a shell script using gdb (we can do that using the shell instead, set -x ). So, we had to find out where the binary was located, and then ran it through the GDB debugger. It came to be right here: /usr/lib64/ooo3/program/soffice.bin. So, we ran the software directly within a GDB debugging session this way:

gdb ./soffice.bin


(gdb) run
Starting program: /usr/lib64/ooo3/program/soffice.bin

Program received signal SIGSEGV, Segmentation fault.
0x00007fffe97f2e2e in XF86DRIQueryVersion () from /usr/lib64/

Evidently, the segmentation fault appeared and the program crashed.

Having a look at the precise offset where this segmentation fault happened, we did know it was right during a call to XF86DRIQueryVersion (), located in the shared library. This library is in charge of the OpenGL routines, so as to give some  graphical hardware acceleration to the software. As I said some lines earlier, we were using the radeonhd non-proprietary driver for this computer, after having experienced some awful issues with the proprietary one (developed by ATI).

Therefore, something had to be directly related to that. And it was, indeed. The XF86DRIQueryVersion () call was trying to ascertain what kind of graphical hardware acceleration, if any, our X server had in order to use it. Then, the software crashed. It was pretty obvious that the library could be the culprit. We got more information about the program backtrace using gdb:

(gdb) bt
#0  0x00007fffe97f2e2e in XF86DRIQueryVersion () from /usr/lib64/
#1  0x00007fffe97f2fc9 in XF86DRIQueryExtension () from /usr/lib64/
#2  0x00007fffe97f28dc in ?? () from /usr/lib64/
#3  0x00007fffe97cd7ff in ?? () from /usr/lib64/
#4  0x00007fffe97c6f43 in glXGetConfig () from /usr/lib64/

#5  0x00007fffea40d6f6 in ?? () from /usr/lib64/ooo3/basis3.0/program/
#6  0x00007fffea43416e in SalDisplay::BestVisual(_XDisplay*, int, XVisualInfo&) ()
from /usr/lib64/ooo3/basis3.0/program/
#7  0x00007fffea43738b in SalDisplay::initScreen(int) const () from /usr/lib64/ooo3/basis3.0/program/
#8  0x00007fffeb08a07a in ?? () from /usr/lib64/ooo3/basis3.0/program/
#9  0x00007fffea443569 in vcl_sal::WMAdaptor::WMAdaptor(SalDisplay*) ()
from /usr/lib64/ooo3/basis3.0/program/
#10 0x00007fffea44499b in ?? () from /usr/lib64/ooo3/basis3.0/program/
#11 0x00007fffea44507e in vcl_sal::WMAdaptor::createWMAdaptor(SalDisplay*) ()
from /usr/lib64/ooo3/basis3.0/program/
#12 0x00007fffea438c8a in SalDisplay::Init() () from /usr/lib64/ooo3/basis3.0/program/
#13 0x00007fffeb08a96c in ?? () from /usr/lib64/ooo3/basis3.0/program/
#14 0x00007fffeb08b230 in create_SalInstance () from /usr/lib64/ooo3/basis3.0/program/
#15 0x00007ffff3e37d44 in ?? () from /usr/lib64/ooo3/program/../basis-link/program/
#16 0x00007ffff3e38f00 in ?? () from /usr/lib64/ooo3/program/../basis-link/program/
#17 0x00007ffff3bde920 in InitVCL(com::sun::star::uno::Reference<com::sun::star::lang::XMultiServiceFactory> const&) ()
from /usr/lib64/ooo3/program/../basis-link/program/
#18 0x00007ffff3bdecc7 in ?? () from /usr/lib64/ooo3/program/../basis-link/program/
#19 0x00007ffff3bdeec5 in SVMain() () from /usr/lib64/ooo3/program/../basis-link/program/
#20 0x00007ffff796663c in soffice_main () from /usr/lib64/ooo3/program/../basis-link/program/

Well, a lot of functions had been called. But the last ones were the ones we had to give them a thought. Something wrong happened whenever using a call to the XF86DRIQueryVersion (), implemented in the shared library So, that was the issue. Then, we thought through it this way: either the library was wrong, or something else was happening. We decided to try the first approach:

ls -l /usr/lib64/
lrwxrwxrwx 1 root root 29 2011-04-07 13:54 /usr/lib64/ -> /usr/lib64/fglrx/

So, the library belonged to the fglrx ATI proprietary drivers, despite the fact we were using it no more.

We had to use the Mesa one, instead. We reinstalled it, and after doing precisely so, we checked it out:

ls -l /usr/lib64/
lrwxrwxrwx 1 root root 12 2011-04-29 12:17 /usr/lib64/ ->

We ran the software again, this time it worked perfectly fine. So far, so good.