Latest Tweets

GNU libc 2.7 name resolution BUG concerning HEADER->ra=0 bit

Preamble

In modern GNU/Linux systems, the mechanism to resolve names to ip -addresses is achieved  by the GNU C Library. In fact, this library is located on /lib/libresolv.so. Any GNU/Linux user-space programs which are in need of resolving host names must use the gethostbyname() routine, clearly implemented inside the GNU C Library.

When this call is issued, libresolv.so is in charge of making, sending, receiving and processing any answer going to or coming from the DNS servers previously configured in the file /etc/resolv.conf.

If any of this name servers fail, libresolv.so will use the next one. Depending on some options configured in /etc/resolv.conf, that could be done using the next name server entry line, or not.

The issue

GNU C Library 2.7 is not capable of dealing well with a DNS valid response coming from a name server which has recursion disabled but allows queries to come in only for all the domains managed by it. When this issue happens, libresolv.so does not try to use the next server available in /etc/resolv.conf, no matter if the DNS response received and stored in the buffers does not contain a valid ip-address for the given T-A qtype field. For example, when the host queried is an alias.

Thus, the host ip-address cannot be determined, and the communication fails irremediably.

The BUG

We are using UPC DNS name servers at home, not VPNs, not tunneling, just our DSL connection. These servers allow us to resolve host names inside their domain, that is, upc.es. When we are asking for the ip-address of, so as to speak, www.upc.es, we’ve got this right answer:

PING www.upc.es (147.83.2.135) 56(84) bytes of data.
64 bytes from upc.cat (147.83.2.135): icmp_seq=1 ttl=51 time=73.6 ms
64 bytes from upc.es (147.83.2.135): icmp_seq=2 ttl=51 time=67.5 ms

However, when it comes to asking for, say, mail3.upc.es, which is a CNAME, that’s what happens:

ping mail3.upc.es
ping: unknown host mail3.upc.es

Clearly, there’s some uncommon behaviour right there, isn’t there? Okay, let’s try using nslookup:

nslookup
> mail3.upc.es
;; Got recursion not available from 147.83.2.3, trying next server
;; Got recursion not available from 147.83.2.10, trying next server
;; Got recursion not available from 147.83.2.3, trying next server
;; Got recursion not available from 147.83.2.10, trying next server
Server:         192.168.2.1
Address:        192.168.2.1#53

Non-authoritative answer:
mail3.upc.es    canonical name = carulli.upc.es.
carulli.upc.es  canonical name = carulli.upcnetadm.upcnet.es.
Name:   carulli.upcnetadm.upcnet.es
Address: 147.83.2.99

Gotcha! This command does not use GNU C Library to resolve host names to ip-addresses, ’cause it has its own libresolv implementation, bind ! Thanks to nslookup, we can figure out that this BUG could be solved checking the DNS server response in order to determine when recursion is available and when it is not, in order to hop to the next name server. Okay, let’s go !

Using strace

In order to be completely sure about this stuff, we can use strace so that we can trace all connections to the name servers done by libresolv.so. So, I ran the “ping” command using Glib C system library through strace. These are the results:

128 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“147.83.2.3”)}, 28) = 0
129 fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
130 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
131 poll([{fd=3, events=POLLOUT}], 1, 0)    = 1 ([{fd=3, revents=POLLOUT}])
132 sendto(3, “P\206\1\0\0\1\0\0\0\0\0\0\5mail3\3upc\2es\0\0\1\0\1″…, 30, MSG_NOSIGNAL, NULL, 0) = 30
133 poll([{fd=3, events=POLLIN}], 1, 5000)  = 1 ([{fd=3, revents=POLLIN}])
134 ioctl(3, FIONREAD, [129])               = 0
135 recvfrom(3, “P\206\205\0\0\1\0\2\0\1\0\1\5mail3\3upc\2es\0\0\1\0\1\300\f\0″…,
136  1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“147.83.2.3”)}, [16]) = 129
137 close(3)                                = 0
138 open(“/etc/ld.so.cache”, O_RDONLY)      = 3
139 fstat(3, {st_mode=S_IFREG|0644, st_size=148918, …}) = 0
140 mmap(NULL, 148918, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6718619000
141 close(3)                                = 0
142 access(“/etc/ld.so.nohwcap”, F_OK)      = -1 ENOENT (No such file or directory)
143 open(“/lib/libnss_mdns4.so.2”, O_RDONLY) = 3
144 read(3, “\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\v\0\0\0\0\0\0@”…, 832) = 832
145 fstat(3, {st_mode=S_IFREG|0644, st_size=9736, …}) = 0
146 mmap(NULL, 2105024, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b6719393000
147 mprotect(0x2b6719395000, 2093056, PROT_NONE) = 0
148 mmap(0x2b6719594000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x2b6719594000
149 close(3)                                = 0
150 munmap(0x2b6718619000, 148918)          = 0
151 open(“/etc/mdns.allow”, O_RDONLY)       = -1 ENOENT (No such file or directory)
152 write(2, “ping: unknown host mail3.upc.es\n”…, 32) = 32
153 exit_group(2)                           = ?

Obviously, I was right. There’s only one connection, precisely to the first name server (147.83.2.3) present at /etc/resolv.conf. Libresolv.so does not contact the next one, it accepts the DNS response altought there’s no ip-address at all !

Debugging glibc 2.7

Inside its source code, libresolv routines can be found on resolv/ directory. Looking with cscope, I found where the code fails: in the function send_dg, implemented in the file resolv/res_send.c, there’s no code in charge of testing the recursion bit stored in the DNS response buffer at all. In order to hop to the next name server, we need to return with and error in this particular case, that is, as long as there’s no recursion available for this name server response.

In order to do so, looking at the sources, we really know the bit we need to check can be accessed through the anhp pointer, this way:

anhp->ra

Well, in case the server does not allow us to use recursion, this bit will be 0, and 1 otherwise. Then:

1025         if (anhp->rcode == SERVFAIL ||
1026             anhp->rcode == NOTIMP ||
1027             anhp->rcode == REFUSED || anhp->ra==0) {
1028
1029             /* TCG patch : */
1030             printf("[TCG glibc-libresolv patch]: %s:%d , HEADER->ra is %d, using next ns entry.     \n",
1031                      __FILE__ , __LINE__ , anhp->ra);

I decided to write my patch down right here because, as a matter of fact, there’s no difference between no recursion available and connection refused or query not allowed, either, as our beloved nslookup has shown us some lines earlier.

Now, it’s time to test our code. After compiling the entire GNU C Library and installing it on another location, say, /usr/lib/glibc-tcg, I wrote a trivial GNU C Library wrapper for my binaries in order to use it:

1 #!/bin/bash
2 old=$LD_LIBRARY_PATH
3 LD_LIBRARY_PATH=/usr/local/glibc-tcg/lib:$LD_LIBRARY_PATH
4 export LD_LIBRARY_PATH
5 ldd `which $*`
6 $*
7 export LD_LIBRARY_PATH=$old
8 exit 0

Testing my patch

Now, let’s see what happens when we try to connect to mail3.upc.es using telnet and the GNU C system library:

telnet mail3.upc.es 993
telnet: could not resolve mail3.upc.es/993: Name or service not known

And now, to conclude our discussion, we’ll try with our patched GNU C Library:

tonicas@catxarru:~/dev/glibc$ ./wrapper.sh telnet mail3.upc.es 993
linux-vdso.so.1 =>  (0x00007fff0ddfe000)
libncurses.so.5 => /lib/libncurses.so.5 (0x00002b8b9d070000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00002b8b9d2af000)
libm.so.6 => /usr/local/glibc-tcg/lib/libm.so.6 (0x00002b8b9d5bb000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00002b8b9d83f000)
libc.so.6 => /usr/local/glibc-tcg/lib/libc.so.6 (0x00002b8b9da56000)
libdl.so.2 => /usr/local/glibc-tcg/lib/libdl.so.2 (0x00002b8b9ddaa000)
/lib64/ld-linux-x86-64.so.2 (0x00002b8b9ce53000)
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
[TCG glibc-libresolv patch]: res_send.c:1030 , HEADER->ra is 0
Trying 147.83.2.99…

Connected to carulli.upcnetadm.upcnet.es.

It works !

Well, now we can use strace so as to run the ping command, as previously in this post. Now, that’s what we get:

109 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“147.83.2.3”)}, 28) = 0
110 fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
111 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
112 poll([{fd=3, events=POLLOUT}], 1, 0)    = 1 ([{fd=3, revents=POLLOUT}])
113 sendto(3, “t\326\1\0\0\1\0\0\0\0\0\0\5mail3\3upc\2es\0\0\1\0\1″…, 30, MSG_NOSIGNAL, NULL, 0) = 30
114 poll([{fd=3, events=POLLIN}], 1, 5000)  = 1 ([{fd=3, revents=POLLIN}])
115 ioctl(3, FIONREAD, [129])               = 0
116 recvfrom(3, “t\326\205\0\0\1\0\2\0\1\0\1\5mail3\3upc\2es\0\0\1\0\1\300\f\0″…, 1024, 0, {sa_family=AF_INET, sin_po    rt=htons(53), sin_addr=inet_addr(“147.83.2.3”)}, [16]) = 129
117 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), …}) = 0
118 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b8e3fc98000
119 write(1, “[TCG glibc-libresolv patch]: res_”…, 63) = 63
120 close(3)                                = 0
121 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
122 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“147.83.2.10”)}, 28) = 0
123 fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
124 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
125 poll([{fd=3, events=POLLOUT}], 1, 0)    = 1 ([{fd=3, revents=POLLOUT}])
126 sendto(3, “t\326\1\0\0\1\0\0\0\0\0\0\5mail3\3upc\2es\0\0\1\0\1″…, 30, MSG_NOSIGNAL, NULL, 0) = 30
127 poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
128 ioctl(3, FIONREAD, [181])               = 0
129 recvfrom(3, “t\326\205\0\0\1\0\3\0\2\0\2\5mail3\3upc\2es\0\0\1\0\1\300\f\0″…, 1024, 0, {sa_family=AF_INET, sin_po    rt=htons(53), sin_addr=inet_addr(“147.83.2.10”)}, [16]) = 181
130 write(1, “[TCG glibc-libresolv patch]: res_”…, 63) = 63
131 close(3)                                = 0
132 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
133 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“192.168.2.1”)}, 28) = 0

(…)

Libresolv.so, this time, contacts the first name server (147.83.2.3), gets no recursion available; it tries the next one (147.83.2.10), gets no recursion available again, and finally contacts the last one and gets not only recursion available, but the valid ip-address for the given host. Now, we can connect to mail3.upc.es at last !

This bug has been reported using Bugzilla to the GNU community, it has code 11156. Take a look at it: here.