Using gdb to solve yet more software-related problems

The issue

Opening a Microsoft Excel 2007+ spreadsheet in Libre Office Calc Build ID: 350m1(Build:2) on a Debian Wheezy 64 bit fires a segmentation fault. The document contains three spreadsheets with some formulas and links between them.:

[936028.103160] soffice.bin[3495]: segfault at 200030000 ip 0000000200030000 sp 00007fffff42baa8 error 14 in[7f4e6bfb9000+a2000]

Let’s try to determine where it exactly crashes

We need to run the program within a gdb session, to be able to determine where the software exactly triggers the segmentation fault. This is the normal approach whenever fixing software related-bugs, of course, and whenever looking for reported ones. So, the first thing to do is to install the debug symbols for the Libre Office package:

#apt-get install libreoffice-dbg

After that, we can run Libre Office’s Calc inside gdb, setting up its parameters first:

~ gdb /usr/lib/libreoffice/program/soffice.bin

(gdb) set args -o file_that_segfaults_libreoffice.xlsx

Finally, we run the program and we’ve got the exact location inside the Libre Office’s Calc where the memory access violation takes place:

(gdb) r

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000041 in ?? ()

(gdb) bt

#0 0x0000000000000041 in ?? ()
#1 0x00007fffd7627225 in ScFormulaCell::Compile (this=0x7fffe000a770, rFormula=…, bNoListening=false,
at /home/rene/Debian/Pakete/LibreOffice/libreoffice-3.5.4+dfsg2/sc/source/core/data/cell.cxx:1076

According to the previous listing, the segmentation fault happens at sc/source/core/data/cell.cxx, line 1076, in the Libre Office’s source. As we can see at frame #0, the 0x0000000000000041 is obviously an illegal memory address, that is why the software crashes. In these cases, there is always a memory corruption issue involved. So in frame #1, it is more than probable that some pointer or structure is being overwritten with more bytes. Or maybe some pointer is being liberated where it should not.

Patching the code in a quirk and dirty way

Let’s have a look at the sc/source/core/data/cell.cxx file. In order to do that, we need to install Libre Office’s sources first:

 # apt-get source libreoffice

Using an ASCII editor we can now open the C file and go to the ScFormulaCell:Compile method, where the segfault occurs. This is shown below:

ScColumn::CompileDBFormula showing the line where the segfaults takes place

ScColumn::CompileDBFormula showing the line where the segfault takes place

The line that fires the segmentation faults frees the memory address pointed by pCodeOld, of type ScTokenArray *. We cannot see the pCode defined in this method, therefore we assume it is defined elsewhere and that is somehow shared in the class. At the time of the crash, the ScTokenArray pCodeOld structure holds this values:

(gdb) frame 1

(gdb) p *pCodeOld

$4 = (ScTokenArray *) 0x1b5b9d0
(gdb) p *pCodeOld
$5 = {
<formula::FormulaTokenArray> = {
_vptr.FormulaTokenArray = 0x7fffd24f43c0,
pCode = 0x1b5e5e0,
pRPN = 0x1b61490,
nLen = 21840,
nRPN = 53839,
nIndex = 32767,
nError = 0,
nRefs = 21856,
nMode = 79 ‘O’,
bHyperLink = 210
}, <No data fields>}

However, whenever this delete pCodeOld is correctly executed and no crashes occurs, these values are quite different:

 (gdb) p *pCodeOld
$6 = {
<formula::FormulaTokenArray> = {
_vptr.FormulaTokenArray = 0x7fffd8403090,
pCode = 0x0,
pRPN = 0x0,
nLen = 0,
nRPN = 0,
nIndex = 0,
nError = 0,
nRefs = 0,
nMode = 1 ‘\001’,
bHyperLink = false
}, <No data fields>}

Now, we could go from stack frame to stack frame, analysing the methods and its parameters, and reading the sources in order to understand where the exact issue lays. But we do not know much about Libre Office’s Calc’s implementation, so we could try another approach in order to be able to open the document and save it in another format. We do know now that the memory corruption takes place immediately when trying to free the pCodeOld memory address that turns to be an illegal one whenever some values held by the structure are greater than zero, for example: pCodeOld->nRPN. Let’s try, then, to avoid freeing the illegal memory address and see what happens.

Changing the software’s behaviour without recompiling

Now, we can use gdb in order to alter the software control-path so as to avoid freeing pCodeOld if it holds odd values. To accomplish this, we will be making use of gdb conditional breakoints. We need to ensure that, whenever pCodeOld is holding, so to speak, big values, we need to make it NULL to avoid freeing it. We can write this trivial gdb command-set and then re-run the program inside it:

set args -o file-that-fires-the-segfault.xls
set pagination off
b cell.cxx:1075
set $hits = 0
commands 1

set $check = pCodeOld->nRPN
printf “Check is: %d\n”, $check
if $check>0
printf “Patching pCodeOld to avoid the crash …”
set $hits++
set var pCodeOld=0x0


Now, if we run the program again, it does not crash and we can have the problematic document opened. We can save it in another format – let’s say, Libre Office’s Calc format. It has been proven that we could open the document in this other format with no issues at all.

We can infer, from the code snippet above, that the memory corruption takes place at least twice:

(gdb) p $hits
$1 = 2

 Even without knowing too much about Libre Office’sCalc’s internals, and by using a quirk-and-dirty approach, we could in the end get the document opened and saved it using another format that allowed us to use it with no more problems.

Demonstration videos

Below, a couple of videos show this issue and its fix. The first video shows how, whilst opening the problematic document, Libre Office just crashes, leaving a segmentation fault error message on /var/log/messages. The second one shows how I ran the program inside a gdb session, making use of gdb’s breakpoints commands to alter the software control-path and getting the document opened, as previously described in this post.

Libre Office crashes whilst opening the problematic XLSX document.

Getting the document opened through gdb.