Bug 48410

Summary: mod_jk + 1100 virtual hosts + mod_rewrite with enabled log == segmentation fault
Product: Tomcat Connectors Reporter: Dmitry Goncharov <dmitry.goncharov>
Component: mod_jkAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Dmitry Goncharov 2009-12-18 05:03:17 UTC
Problem
We began receiving apache segmentation fault, when our configuration reach the 1068 virtual hosts. 

We checked this bug with mod_jk versions from 1.2.25 to 1.2.28, apache 2.2.8, 2.2.12, 2.2.14, Linux RedHat and Ubuntu. On the Windows all works fine.


Core dumps contains stack trace depends on virtual hosts count. For example with 1068 virtual hosts we had following stack:

Core was generated by `/usr/sbin/httpd'.
Program terminated with signal 11, Segmentation fault.
[New process 14838]
#0  0x008578b4 in ajp_connection_tcp_get_message@plt () from /etc/httpd/modules/mod_jk.so
(gdb) bt full
#0  0x008578b4 in ajp_connection_tcp_get_message@plt () from /etc/httpd/modules/mod_jk.so
No symbol table info available.
#1  0x00882940 in ajp_handle_cping_cpong (ae=0x9b4e9b0, timeout=16000, l=0xb231d48) at jk_ajp_common.c:875
        msg = (jk_msg_buf_t *) 0x9b4ea14
        __FUNCTION__ = "ajp_handle_cping_cpong"
#2  0x008834fa in ajp_connect_to_endpoint (ae=0x9b4e9b0, l=0xb231d48) at jk_ajp_common.c:955
        buf = "192.168.201.16:8011\000\v2\206\000\030226\000\000\000"
        rc = 1
        __FUNCTION__ = "ajp_connect_to_endpoint"
#3  0x008843aa in ajp_service (e=0x9b509d8, s=0xbff9b4dc, l=0xb231d48, is_error=0xbff9b5e0) at jk_ajp_common.c:1506
        i = 0
        err = <value optimized out>
        oper = {request = 0x9b4e9cc, reply = 0x9b4e9e4, post = 0x9b4e9fc, uploadfd = -1, recoverable = 1}
        p = <value optimized out>
        aw = (ajp_worker_t *) 0xc791210
        rc = 209407808
        msg = <value optimized out>
        retry_interval = 100
        __FUNCTION__ = "ajp_service"
...




With 2600 virtual host we received stack smashing error instead of segmentation fault. Here is strace output:
...
write(2615, "[Fri Dec 18 13:25:08.047 2009] ["..., 103) = 103
semop(4620344, 0x8fe712, 1)             = 0
select(2621, [1028 1090 1091 1092 1093 1094 1097 1099 1101 1103 1105 1106 1107 1109 1110 1122 1124 1129 1131 1135 1141 1144 1147 1156 1157 1158 1159 1161 1162 1163 1167 1169 1170 1171 1175 1178 1179 1187 1189 1190 1191 1192 1193 1196 1197 1199 1200 1202 1203 1207 1208 1209 1210 1211 1212 1213 1215 1216 1217 1220 1221 1222 1227 1228 1229 1230 1231 1232 1233 1235 1237 1238 1250 1251 1252 1253 1257 1259 1287 1289 1290 1291 1292 1293 1316 1317 1318 1319 1321 1322 1323 1327 1329 1330 1331 1335 1338 1339 1344 1348 1349 1351 1352 1354 1361 1362 1363 1365 1366 1411 1412 1414 1416 1417 1419 1420 1421 1422 1424 1426 1427 1429 1430 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1476 1477 1478 1479 1482 1483 1484 1487 1488 1490 1491 1493 1494 1510 1511 1539 1541 1543 1568 1571 1574 1575 1604 1632 1633 1635 1638 1644 1645 1648 1649 1652 1653 1656 1659 1660 1661 1671 1673 1674 1675 1676 1677 1700 1701 1703 1704 1707 1711 1717 1720 1723 1730 1731 1732 1733 1734 1737 1739 1741 1743 1745 1746 1747 1749 1750 1764 1765 1766 1767 1769 1770 1771 1775 1777 1778 1779 1783 1786 1787 1796 1797 1799 1800 1803 1807 1813 1816 1819 1827 1828 1830 1834 1836 1837 1839 1840 1842 1843 1847 1848 1849 1850 1851 1852 1853 1855 1857 1859 1860 1861 1862 1863 1866 1874 1875 1877 1878 1920 1921 1922 1923 1928 1929 1931 1933 1934 1935 1936 1938 1939 1941 1942 1952 1953 1957 1959 1960 1961 1987 1988 1990 1992 1993 1994 2001 2002 2003 2005 2006 2016 2048 2050 2051 2053 2055 2056 2057 2058 2059 2061 2062 2063 2064 2066 2067 2069 2070 2082 2083 2084 2085 2089 2091 2115 2117 2122 2124 2125 2127 2128 2130 2131 2135 2136 2137 2138 2139 2140 2141 2143 2145 2147 2148 2149 2152 2154 2157 2160 2161 2164 2166 2169 2170 2171 2173 2176 2208 2210 2213 2216 2219 2220 2222 2224 2226 2227 2228 2230 2237 2273 2304 2305 2307 2336 2340 2341 2344 2347 2348 2349 2353 2356 2357 2361 2362 2363 2365 2368 2372 2373 2377 2378 2380 2381 2387 2388 2389 2393 2394 2395 2397 2401 2404 2405 2412 2413 2416 2420 2421 2425 2426 2427 2429 2432 2436 2437 2441 2442 2444 2445 2449 2451 2452 2453 2459 2460 2461 2468 2469 2472 2476 2477 2480 2484 2485 2496 2497 2499 2505 2513 2515 2517 2518 2531 2532 2533 2534 2535 2540 2541 2545 2548 2549 2554 2556 2557 2601 2602 2605 2606 2607 2610 2611 2612 2613 2615 2617 2618 2620], NULL, NULL, {16, 0}*** buffer overflow detected ***: strace terminated
======= Backtrace: =========
/lib/libc.so.6(__chk_fail+0x41)[0x1f4f21]
/lib/libc.so.6[0x1f43e7]
strace[0x804ff52]
strace[0x804c81e]
strace[0x804a9ad]
strace[0x804bd4b]
/lib/libc.so.6(__libc_start_main+0xdc)[0x125dec]
strace[0x8049571]
======= Memory map: ========
00110000-0024d000 r-xp 00000000 fd:00 229401     /lib/libc-2.5.so
0024d000-0024f000 r-xp 0013c000 fd:00 229401     /lib/libc-2.5.so
0024f000-00250000 rwxp 0013e000 fd:00 229401     /lib/libc-2.5.so
00250000-00253000 rwxp 00250000 00:00 0
005a2000-005bc000 r-xp 00000000 fd:00 229394     /lib/ld-2.5.so
005bc000-005bd000 r-xp 00019000 fd:00 229394     /lib/ld-2.5.so
005bd000-005be000 rwxp 0001a000 fd:00 229394     /lib/ld-2.5.so
00668000-00669000 r-xp 00668000 00:00 0          [vdso]
00c1e000-00c29000 r-xp 00000000 fd:00 229414     /lib/libgcc_s-4.1.2-20080102.so.1
00c29000-00c2a000 rwxp 0000a000 fd:00 229414     /lib/libgcc_s-4.1.2-20080102.so.1
08048000-0807c000 r-xp 00000000 fd:02 67892      /usr/bin/strace
0807c000-0807d000 rw-p 00034000 fd:02 67892      /usr/bin/strace
0807d000-08089000 rw-p 0807d000 00:00 0
081a4000-081c5000 rw-p 081a4000 00:00 0
b7fdd000-b7fdf000 rw-p b7fdd000 00:00 0
bfb37000-bfb4c000 rw-p bfb37000 00:00 0          [stack]
*** stack smashing detected ***: /usr/sbin/httpd terminated
Aborted


We can see that last logged function was `select`. Looks like it was called from mod_jk function jk_is_input_event.

int jk_is_input_event(jk_sock_t sd, int timeout, jk_logger_t *l)
{
    fd_set rset;
    struct timeval tv;
    int rc;
    int save_errno;

    JK_TRACE_ENTER(l);

    errno = 0;
    FD_ZERO(&rset);
    FD_SET(sd, &rset);
    tv.tv_sec = timeout / 1000;
    tv.tv_usec = (timeout % 1000) * 1000;

    do {
        rc = select((int)sd + 1, &rset, NULL, NULL, &tv);
    } while (rc < 0 && errno == EINTR);
    …


fd_set is fixed size structure it can accept sd in range from [0,1023] (in our environment). When mod_rewrite has enabled logging for virtual host, it opens log file for every virtual host (does not matter that rewrite use same log file for all virtual hosts). Therefore sd becomes greater than 1023 and mod_jk writes `1` to wrong memory address. 

May be mod_jk can use `poll` instead of `select`? 


Workaround
Turn off mod_rewrite log.
Comment 1 Mladen Turk 2009-12-18 05:36:46 UTC
I'm already working on the poll implementation, so yes.
Should be part of next mod_jk release
Comment 2 Mladen Turk 2009-12-20 23:15:07 UTC
Fixed in the SVN by using poll instead select
You might check the sources from the trunk.
Comment 3 Dmitry Goncharov 2009-12-21 02:40:59 UTC
Thank you!
Comment 4 Rainer Jung 2010-02-23 02:51:53 UTC
Will be part of 1.2.29.