Bug 59897

Summary: Buffer Overflow in FD_SET in nb_connect (jk_connect.c) leading to apache2 crash
Product: Tomcat Connectors Reporter: Michael Diener <mdiener>
Component: mod_jkAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: audiotone, fredrik.carpio, kwilde
Priority: P2 Keywords: PatchAvailable
Version: 1.2.41   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: [PATCH] Use poll(2) in posix nb_connect

Description Michael Diener 2016-07-25 12:33:57 UTC
mod_jk occasionally crashes Apache because due to a buffer overflow.



mod_jk 1.2.41 (happens also for 1.2.37)
Apache 2.4.7
Tomcat 6.0.39
Java 1.6.0_45 x86
Linux Ubuntu 14.04 x64 (3.13.0-91-generic)



Here is the error log from Apache:

**** buffer overflow detected ***: /usr/sbin/apache2 terminated=======
Backtrace:
=========/lib/x86_64-linux-gnu/libc.so.6(+0x7329f)[0x7fe9aa7de29f]/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fe9aa875bbc]/lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7fe9aa874a90]/lib/x86_64-linux-gnu/libc.so.6(+0x10ab07)[0x7fe9aa875b07]/usr/lib/apache2/modules/mod_jk.so(jk_open_socket+0x8d8)[0x7fe9a7c60cb8]/usr/lib/apache2/modules/mod_jk.so(ajp_connect_to_endpoint+0x65)[0x7fe9a7c7bf75]/usr/lib/apache2/modules/mod_jk.so(+0x36422)[0x7fe9a7c7d422]/usr/lib/apache2/modules/mod_jk.so(+0x1674c)[0x7fe9a7c5d74c]/usr/sbin/apache2(ap_run_handler+0x40)[0x7fe9ab65fbe0]/usr/sbin/apache2(ap_invoke_handler+0x69)[0x7fe9ab660129]/usr/sbin/apache2(ap_process_async_request+0x20a)[0x7fe9ab6756ca]/usr/sbin/apache2(+0x69500)[0x7fe9ab672500]/usr/sbin/apache2(ap_run_process_connection+0x40)[0x7fe9ab669220]/usr/lib/apache2/modules/mod_mpm_event.so(+0x681b)[0x7fe9a783981b]/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7fe9aab38184]/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe9aa86537d]*
======= Memory map: ========
7fe688000000-7fe68806a000 rw-p 00000000 00:00 0
7fe68806a000-7fe68c000000 ---p 00000000 00:00 0
.......
7fffa6c27000-7fffa6c48000 rw-p 00000000 00:00 0 [stack]
7fffa6c86000-7fffa6c88000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[Wed Jun 29 05:01:50.052325 2016] [core:notice] [pid 1747:tid
140641581987712] AH00051: child pid 17018 exit signal Aborted (6), possible
coredump in /etc/apache2



I was able to trace it down to the method nb_connect in jk_connect.c. In version 1.2.41 the issue is line 291:

280>   do {
281>        rc = connect(sd, (const struct sockaddr *)&addr->sa.sin, addr->salen);
282>    } while (rc == -1 && errno == EINTR);
283>
284>    if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
285>                   && (timeout > 0)) {
286>        fd_set wfdset;
287>        struct timeval tv;
288>        socklen_t rclen = (socklen_t)sizeof(rc);
289>
290>        FD_ZERO(&wfdset);
*291>        FD_SET(sd, &wfdset);*
292>        tv.tv_sec = timeout / 1000;
293>        tv.tv_usec = (timeout % 1000) * 1000;
294>        rc = select(sd + 1, NULL, &wfdset, NULL, &tv);


From what I understand a buffer overflow would only happen for FD_SET if
the fd_set gets over 1024 descriptors. I made sure that my ulimit for open
files is set and applied large enough, so that's not it.



I tried to switch FD_SET to poll and it seems to work now also for sd greater than
1024:

struct pollfd pfd_read;
pfd_read.fd = sd;
pfd_read.events = POLLOUT;
rc = poll(&pfd_read, 1, timeout);



This would be a possible fix for the problem - at least it works fine in my setup.
Also, poll() already seems to be used somewhere else in this particular source file, so no extra import necessary.



Here more configuration files:

/etc/libapache2-mod-jk/httpd-jk.conf

<IfModule jk_module>

        JkWorkersFile /etc/libapache2-mod-jk/workers.properties
        JkLogFile /var/log/apache2/mod_jk.log
        JkLogLevel warn
        JkShmFile /var/log/apache2/jk-runtime-status

</IfModule>




/etc/libapache2-mod-jk/workers.properties

workers.tomcat_home=/usr/share/tomcat6
workers.java_home=/usr/lib/jvm/java-6-sun
ps=/

worker.list=loadbalancer

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=ajp13_worker,ajp13_worker2
worker.loadbalancer.sticky_session=0

worker.ajp13_worker.port=xxx
worker.ajp13_worker.host=localhost
worker.ajp13_worker.type=ajp13
worker.ajp13_worker.ping_mode=A
worker.ajp13_worker.secret=xxx
worker.ajp13_worker.fail_on_status=503
worker.ajp13_worker.connection_pool_size=32768
worker.ajp13_worker.redirect=ajp13_worker2

worker.ajp13_worker2.port=xxx
worker.ajp13_worker2.host=otherhost
worker.ajp13_worker2.type=ajp13
worker.ajp13_worker2.ping_mode=A
worker.ajp13_worker2.secret=xxx
worker.ajp13_worker2.fail_on_status=503
worker.ajp13_worker2.connection_pool_size=32768
worker.ajp13_worker2.activation=disabled



/etc/tomcat6/server.xml

    <Connector
        port="xxx" protocol="AJP/1.3" redirectPort="8443"
        enableLookups="false" maxThreads="65536" minSpareThreads="25"
maxSpareThreads="75"
        connectionTimeout="300000" packetSize="65536" request.secret="xxx"
    />



Apache mpm_event:

        StartServers                     2
        ServerLimit          16

        MinSpareThreads          256
        MaxSpareThreads          1280

        ThreadLimit                      1024
        ThreadsPerChild          1024

        MaxRequestWorkers         16384
        MaxConnectionsPerChild   0



Please also see my question about this in the tomcat_users mailing group here (continued in July):
https://mail-archives.apache.org/mod_mbox/tomcat-users/201606.mbox/%3CCABVo0f+stYj9=Cxrb-t+bhJaf_a9hX2wdvHsBYmE-bge_vwxTg@mail.gmail.com%3E
Comment 1 Michael Diener 2016-07-25 12:56:19 UTC
One more thing to add, although Apache mpm_event is used, most connections are via SSL, so AFAIK it should behave like mpm_worker.
Comment 2 Koen Wilde 2016-11-03 13:59:59 UTC
Created attachment 34417 [details]
[PATCH] Use poll(2) in posix nb_connect

This issue is caused by limitations of the select(2) system call. From the (linux) manpage:

> POSIX allows an implementation to define an upper limit, advertised via the
> constant FD_SETSIZE, on the range of file descriptors that can be specified
> in a file descriptor set.  The Linux kernel imposes no fixed limit, but the
> glibc implementation makes fd_set a fixed-size type, with FD_SETSIZE defined
> as 1024, and the FD_*() macros operating according to that limit.  To
> monitor file descriptors greater than 1023, use poll(2) instead.

As Michiel already noted, poll(2) is already imported in jk_connect.c, so using poll(2) doesn't add any new dependencies.

I've attached a patch that uses poll(2) if it is available at compile time; otherwise it falls back to the current select(2) implementation.

On the long run, it would probably be preferable to use some kind of event library like libuv or libevent that abstracts over the kernel interface, and automatically uses the optimal one available (e.g. epoll on linux and kqueue on FreeBSD). This would both improve portability and performance, and possibly code simplicity.
Comment 3 Christopher Schultz 2017-09-01 14:47:31 UTC
I think this patch is worth serious consideration and testing.

(I feel like we had this conversation elsewhere, too.)
Comment 4 Mark Thomas 2018-08-22 12:13:25 UTC
Many thanks for the patch. Applied to 1.2.x for 1.2.44 onwards.