Xmgr on Debian Squeeze/Ubuntu 12.04.1 LTS from sources fires a SEGFAULT


On modern GNU/Linux distros it is still feasible to use the Xmgr package from pre-compiled binaries. There’s a post in this blog where you can even download some pre-compiled packages for old Debian distros. However, sometimes it is much more practical to compile Xmgr from its sources. This way, one can always use it no matter what sort of GNU/Linux flavour he or she is using. The Xmgr project is discontinued, so there are no more updates. Another package, called Grace, replaces it. Despite this, the old Xmgr is still needed on most occasions because important differences in its script language and behaviour exists.

Compiling Xmgr is an easy task to do. However, it does not work at all. Right after finishing its compilation without errors, the xmgr binary fires a Segmentation Fault. This post analyses this issue and presents a trivial solution.

Compiling Xmgr from sources

 Before going any further, there are some basic steps we have to do in order to compile the Xmgr sources. First, we need to install its source dependencies, that is, the needed libraries used by Xmgr. Obviously enough, Xmgr is quite similar to Grace, so we can infer  that at least most of the libraries that Xmgr uses are quite the same as Grace. Therefore, it immediately follows that we can install all the grace dependencies to compile Xmgr:

# apt-get build-dep grace

Now, if we proceed with the basic steps so as to obtain a binary file for xmgr and run the program, we’ll get a segfault:

# wget ftp://plasma-gate.weizmann.ac.il/pub/xmgr4/src/xmgr-4.1.2.tar.gz
# tar xvfz xmgr-4.1.2.tar.gz
# cd xmgr-1.4.2/
# ./configure
# make
# cd src/
# ./xmgr
xmgr v4.1.2
(C) Copyright 1991-1995 Paul J Turner
(C) Copyright 1996-1998 ACE/gr Development Team
All Rights Reserved
Segmentation fault

It’s time to use GDB again!

We have to re-compile xmgr adding the debugging symbols in order to use gdb and try to locate the issue. Thus:

# ./configure –enable-debug
# make

Running xmgr inside a gdb session shows:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff741c60c in _XtCountVaList () from /usr/lib/libXt.so.6

So we do know now that the function firing the segmentation fault is _XtCountVaList (). We are not acquainted with this function at all, so we need to get more information about it. This function is implemented  inside the libXt.so.6 library, acccording to GDB:

# dpkg -S libXt.so
libxt6: /usr/lib/libXt.so.6.0.0
libxt6: /usr/lib/libXt.so.6
libxt-dev: /usr/lib/libXt.so

# dpkg -p libxt-dev

Description: X11 toolkit intrinsics library (development headers)
libXt provides the X Toolkit Intrinsics, an abstract widget library upon
which other toolkits are based.  Xt is the basis for many toolkits, including
the Athena widgets (Xaw), and LessTif (a Motif implementation).

It seems quite unusual that this library could actually be the culprit, because other parts of the system work pretty well. But still we need to nail the problem down, so we could do with the debugging symbols of the X11 Intrinsics Toolkit:

# apt-get install libxt6-dbg

Now, another xmgr execution inside a new gdb session shows more information:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff741c60c in _XtCountVaList (var=0x7fffffffd400, total_count=0x7fffffffd41c, typed_count=0x7fffffffd418)
at ../../src/Varargs.c:80

It’s time to have a look at the _XtCountValist function. This function is implemented in the Varargs.c source file for the libxt6 package, as previously stated by GDB. So, next step seems fairly evident: we are going to install the libxt6 sources:

# apt-get source libxt

Now, we can analyse this function properly:

 70 void
 71 _XtCountVaList(va_list var, int* total_count, int* typed_count)
 72 {
 73     String          attr;
 75     *total_count = 0;
 76     *typed_count = 0;
 78     for(attr = va_arg(var, String) ; attr != NULL;
 79                         attr = va_arg(var, String)) {
 80         if (strcmp(attr, XtVaTypedArg) == 0) {
 81             (void)va_arg(var, String);
 82             (void)va_arg(var, String);
 83             (void)va_arg(var, XtArgVal);
 84             (void)va_arg(var, int);
 85             ++(*total_count);
 86             ++(*typed_count);
 87         } else if (strcmp(attr, XtVaNestedList) == 0) {
 88             _XtCountNestedList(va_arg(var, XtTypedArgList), total_count,
 89                 typed_count);
 90         } else {
 91             (void)va_arg(var, XtArgVal);
 92             ++(*total_count);
 93         }
 94     }
 95 }

The segmentation fault occurs at line 80. As clearly shown in the previous code snippet, this function iterates on a va_list variable to get all its arguments. At some point inside this for loop, an invalid  pointer is accessed, thus firing the segmentation fault. Line 80 tries to compare a certain attr object, of type String, to a predefined type. This fires the segfault. Thus, we can infer that attr is not valid at all. It is certainly not null, otherwise the for loop would end before line 80. Therefore, the pointer var is not null either. That is so, indeed, according to gdb:

0x00007ffff741c60c in _XtCountVaList (var=0x7fffffffd400, total_count=0x7fffffffd41c, typed_count=0x7fffffffd418)
at ../../src/Varargs.c:80

First hypothesis: “the va_arg() function, then, is returning a pointer to a non valid va_list object at some point inside the for loop.”

Testing our hypothesis

According to the va_arg man page, a va_list object must end with a NULL element. Otherwise, va_arg() will continue fetching va_list objects. In order to test our first hypothesis, previously introduced, we need to go back in the gdb backtrace, this way being able to determine which functions are the ones calling _XtCountVaList(), to ensure that its var parameter does end with a NULL element:

#0  0x00007ffff741c60c in _XtCountVaList () from /usr/lib/libXt.so.6
#1  0x00007ffff741b4c0 in XtVaCreateWidget () from /usr/lib/libXt.so.6
#2  0x000000000047887f in CreateMenu (parent=0x7b4480, name=0x4cfa1c “fileMenu”, label=<value optimized out>,
mnemonic=<value optimized out>, cascade=0x0, help_anchor=0x0) at motifutils.c:1293
#3  0x000000000049c1bb in CreateMainMenuBar () at xmgr.c:871
#4  do_main_winloop () at xmgr.c:1304
#5  0x0000000000407d5d in do_main_loop (argc=1, argv=0x7fffffffe108) at main.c:1138
#6  main (argc=1, argv=0x7fffffffe108) at main.c:996

Apparently, CreateMenu is the function we are looking for, because the next one in the backtrace, that is, XtVaCreateWidget, belongs to the X11 Intrinsics Toolkit also. Our hypothesis is still the same one: var is not ending with a NULL element. Let’s have a look at the CreateMenu function, implemented in the xmgr sources:

1285 Widget CreateMenu(Widget parent, char *name, char *label, char mnemonic,
1286     Widget *cascade, char *help_anchor)
1287 {
1288     Widget menu, cascadeTmp;
1289     XmString str;
1291     str = XmStringCreateSimple(label);
1292     menu = XmCreatePulldownMenu(parent, name, NULL, 0);
1293     cascadeTmp = XtVaCreateWidget((String) name, xmCascadeButtonWidgetClass, parent,
1294         XmNlabelString, str,
1295         XmNmnemonic, mnemonic,
1296         XmNsubMenuId, menu, 
1297         0);
1298     XmStringFree(str);  
1299     if (help_anchor) {
1300         XtAddCallback(menu, XmNhelpCallback, (XtCallbackProc) HelpCB,
1301                 (XtPointer) help_anchor); 
1302     }
1303     XtManageChild(cascadeTmp);
1304     if (cascade != NULL) {
1305         *cascade = cascadeTmp;
1306     }
1308     return menu;
1309 }

It is exactly so. Line 1297 is not a NULL value, according to the /usr/include/linux/stdefs.h header file. As a matter of fact, this 0 value is a constant, and a constant does have a memory address which is different from $0x0. This means that instead of marking the va_list object with its last element (that is, a NULL address), we are, in fact, telling the XtVaCreateWidget that there are more arguments in this va_list pointer! A NULL value  is defined as

#define NULL ((void *)0)

Well, this 0 value means that the call to XtVaCreateWidget does not mark the end of the va_list pointer, therefore reading beyond its allowed memory address space. Thus, in reading the next va_list object the segmentation fault is triggered.

Fixing it

All we have to do is replace this 0 value for a NULL one this way in the src/motifutils.c source file

1292     menu = XmCreatePulldownMenu(parent, name, NULL, 0);
1293     cascadeTmp = XtVaCreateWidget((String) name, xmCascadeButtonWidgetClass, parent,
1294         XmNlabelString, str,
1295         XmNmnemonic, mnemonic,
1296         XmNsubMenuId, menu,
1297         NULL);

Then, we have to recompile the xmgr sources and the program will work just fine :-). We can test Xmgr’s behaviour by running all the demos included in its sources:

# cd xmgr-4.1.2/examples


Running the xmgr’s tests by calling the ./dotest script.

 Let’s face another problem in Ubuntu Distros

After making that small change in the Xmgr original sources, we got another issue under Ubuntu 12.04.1 LTS GNU/Linux distributions: whenever trying to open the “Legends” dialog box, another segmentation fault was triggered.

Armed with gdb and valgrind, we decided to continue with our analysis in order to find out where this new BUG was located. According to our new gdb session, running Xmgr inside an Ubuntu 12.04.1 LTS system we got:

Program received signal SIGSEGV, Segmentation fault.
XrmStringToQuark (name=0x7fff00000000 <Address 0x7fff00000000 out of bounds>)
at ../../src/Quarks.c:360
360 ../../src/Quarks.c: No such file or directory.
(gdb) c

Oops! Got SIGSYS. Please use “Help/Comments” to report the bug.
[Inferior 1 (process 5146) exited with code 01]

Thus, we had a look at the backtrace, filtering out those calls not involving the actual Xmgr sources, to narrow down the issue, and so:

#6 0x0000000000478187 in CreatePanelChoice (parent=0x821bc0,
labelstr=0x4cbfaf “Font:”, nchoices=10) at motifutils.c:125
#7 0x0000000000492256 in define_legend_popup (w=<optimized out>,
client_data=<optimized out>, call_data=<optimized out>) at symwin.c:787

Okay, this time the problem appeared to be located whenever calling the CreatePanelChoice from define_legend_popup() function, in the symwin.c source code file. We had a look in that C source file and that function was making use of va_list pointers, as discussed earlier in this post. All the calls to the CreatePanelChoice function were ending with a 0 element, instead of a NULL, as previously analysed  We altered all those calls to ensure that the last va_list element was a NULL instead of a zero. After doing that, we recompiled Xmgr and it finally worked perfectly well!

You can get the Xmgr sources with this small alterations right HERE.