Latest Tweets

X11 tunneling excessive delays running emacs21-x and xmgrace

Preamble

Recently, we’ve experienced an annoying problem concerning emacs21 and xmgrace. Running either of them through an X11 ssh tunneling connection caused excessive delays during their start-up process. However, on another computer running exactly the same OS, Secure Shell server and emacs or xmgrace versions, no strange behaviour was detected. Curiously, any GTK+ based application had no delay.

Xt, Motiff

According to this article, the main difference between emacs21 or xmgrace  and any GTK+ based application was related to the fact that GTK+ does not use Xt (X-toolkit), as clearly shown in the next figure:

So, at first I thought the problem had to be directly related to the Xt library, that is: libXt.so. In order to figure out what was happening behind the scenes, I was in need of tracing any library calls. I used ltrace command this way:

ltrace -c /usr/bin/emacs-21.4-x > emacs_ltrace_output

I ran this command on two different systems, running the same OS, sshd, emacs21 and xmgrace versions. One of them, named “A” had the delays, the other, let us call it “B”,  worked fine.

XSync

Reading the ltrace output files, I came to realize there was an obvious time difference when it came to calling XSync() function. According to the X11 developer’s manual, this call is used to update any pending graphic events on the X-Server, waiting for a response just before sending more X11 commands. Clearly, this call was involved in TCP sockets communication through the secure channel previously established between host “A” or host “B”, that is: the X11 client, and the ssh-client, that is: the X11 server. Don’t forget when it refers to X11 communication, we have to think the other way around ;-).

The delta was startling, nearly 5,173157 seconds.

Below, the ltrace results for host “A” and host “B”:

Host A (which was affected by the delay):

1 % time     seconds  usecs/call     calls      function
2 —— ———– ———– ——— ——————–
3  44.97   31.069490         151    205136 _IO_getc
4  16.40   11.331884         538     21062 memmove
5   7.58    5.234543       85812        61 XSync
6   4.93    3.407104         115     29415 malloc
7   4.13    2.853131      713282         4 XGetImage
8   4.07    2.812340     2812340         1 XRegisterIMInstantiateCallback
9   3.96    2.736333         131     20816 free
10   3.46    2.388880     2388880         1 XOpenIM
11   3.05    2.106637         282      7462 ungetc
12   0.89    0.617405          21     28956 mallopt
13   0.81    0.562518       62502         9 select
14   0.68    0.471008       94201         5 XtCreateWidget
15   0.62    0.429039        4009       107 XQueryColors
16   0.53    0.369573        3588       103 kill
17   0.51    0.351702       13026        27 XpmReadFileToPixmap

(…)

Host B (which was not affected by the delay):

% time     seconds  usecs/call     calls      function
—— ———– ———– ——— ——————–
31.97   53.248228         717     74227 mallopt
19.92   33.175573         436     75984 malloc
18.11   30.159029         140    214680 _IO_getc
15.21   25.336675         490     51639 free
6.00    9.997167     9997167         1 XRegisterIMInstantiateCallback
2.45    4.086560       38192       107 XQueryColors
1.53    2.549606         118     21515 memmove
1.40    2.327414         173     13427 strlen
1.21    2.019241        2031       994 XrmQGetResource
0.30    0.503367       71909         7 select
0.21    0.355900      355900         1 XCreateFontSet
0.19    0.321360          25     12531 strcmp
0.17    0.285254          37      7619 gettimeofday
0.13    0.215274        7973        27 XpmReadFileToPixmap
0.11    0.175393          37      4725 setitimer
0.10    0.163919          20      7869 ungetc
0.10    0.158577         612       259 XAllocColor
0.09    0.144658        1418       102 kill
0.08    0.133403      133403         1 XtOpenDisplay
0.06    0.104499        8038        13 qsort
0.06    0.094230         395       238 XParseColor
0.05    0.079065          19      3997 __strtol_internal
0.05    0.077507          22      3381 strcpy
0.04    0.065410        5031        13 XCreateFontCursor
0.04    0.061386        1023        60 XSync

(…)

I show only any library call over 0.50 seconds, to avoid too many information. And, obviously, because of the fact it was merely a delay-problem, I focussed all my attention on those large-time calls. I was one hundred per cent sure the problem was directly related to the XSync() call,in charge of updating the X-Server screen. But it seemed not an Xt problem after all. It was not possible, due to the fact I’ve got another computer with the same configuration: “Host B”. So, I came to realize it had to be something related to the network layer.

Tracing the network layer

I captured some packets comming back from host “A” and host “B” to the X-Server host using tcpdump this way:

tcpdump -i eth1 ‘src host A or dst host A’ -w output.tcpdump -pe

tcpdump -i eth1 ‘src host B or dst host B’ -w output.tcpdump -pe

So, I got some traffic I was capable of analysing using tcptrace and xplot utilities. Because it was a delay problem, I decided to generate time-sequence-graphs. What I got was truly interesting, as shown in the next figures:

Host "A" and "Host B" time-sequence-graphic running emacs21-x

Looking at the graphs, one can easily come to a conclusion: it’s a network-layer problem indeed. The “Host A” seems to have a lot of problems sending TCP packets, and there are a lot of retransmissions. At the same time, the graph shows us an important delay comparing both. Reading all captured data using wireshark, there were excessive small packets coming in and out from “Host A” and the X-Server.

Then, I decided I had to avoid all those small packets being transmitted. To do that, what I did need was to increase the ssh compression level on “Host A”.

Secure Shell Compression Level

According to the Secure Shell man page, we can define up to 9 different compression levels. So, I tried to run emacs21-x on “Host A” demanding precisely this very hight compression-level using the command line this way:

ssh -Y -C -o CompressionLevel=9 user@host

Then, I generated the same graphic and traced all calls to the XSync() function on “Host A”. Gotcha! Take a look at the  new graphs on “Host A”. The first one is without  begging for level 9 compression, and the second one demanding it explicitly through the CLI:

"Host A" without begging for Compression , and "Host A" demanding it with -o CompressionLevel=9

Clearly, our second time-sequence-graph is superior even to “Host B”, that’s because we used compression level 9, the highest available for ssh.

At the same time, the XSync() calls, now, are truly normal and apparently very close to “Host B”:

1 % time     seconds  usecs/call     calls      function
2 —— ———– ———– ——— ——————–
3  45.01   14.913325          72    205130 _IO_getc
4  12.28    4.067767         324     12524 strcmp
5   7.87    2.609310         125     20802 free
6   7.58    2.513034      314129         8 select
7   6.45    2.136086        8975       238 XParseColor
8   6.13    2.030367         820      2474 strcat
9   2.99    0.991628          33     29400 malloc
10   2.61    0.866303          29     28947 mallopt
11   2.46    0.815759      815759         1 XRegisterIMInstantiateCallback
12   0.99    0.326420          15     21056 memmove
13   0.66    0.218044        3574        61 XSync
14   0.63    0.207135        7671        27 XpmReadFileToPixmap
15   0.58    0.191090         737       259 XAllocColor
16   0.56    0.187009          13     13577 strlen
(…)

Approximately 0.218044 seconds! That’s a delta of  5,016499 seconds!

Conclusions

Well, it was merely a network-delay problem. But it was quite difficult to work it out, so I decided to use simply more compression than, say, in Host B, which clearly had no need of using more than “the default ssh compression level determined by sshd_config file”. This way, it is feasible to run emacs21, xmgrace or whichever non-GTK+, non-QT programs, the user is in need of, even on those so-congested networks, with a normal behaviour when it comes to delays or lags. Of course, the best solution is to get rid of X11 redirection and use other tools, like NX, for example.