On SGI IRIX mod_jk 1.2.14.1 crashes on every request in jk_lb_worker.c line 605. Going back to 1.2.11 works fine. Stack trace below: > 0 service(e = 0x1035afe8, s = 0x7fff2b28, l = 0x10281f68, is_error = 0x7fff1af4) ["/tmp/jakarta-tomcat-connectors-1.2.14.1-src/jk/native/common/jk_lb_worker.c":605, 0x4315080] 1 jk_handler(r = 0x103719a0) ["/tmp/jakarta-tomcat-connectors-1.2.14.1-src/jk/native/apache-2.0/mod_jk.c":1889, 0x4305648] 2 ap_run_handler(r = 0x103719a0) ["/tmp/httpd-2.0.54/server/config.c":152, 0x1008b624] 3 ap_invoke_handler(r = 0x103719a0) ["/tmp/httpd-2.0.54/server/config.c":364, 0x1008c390] 4 ap_process_request(r = 0x103719a0) ["/tmp/httpd-2.0.54/modules/http/http_request.c":249, 0x10071990] 5 ap_process_http_connection(c = 0x10365460) ["/tmp/httpd-2.0.54/modules/http/http_core.c":251, 0x10070f78] 6 ap_run_process_connection(c = 0x10365460) ["/tmp/httpd-2.0.54/server/connection.c":43, 0x100a3e94] 7 ap_process_connection(c = 0x10365460, csd = 0x10365378) ["/tmp/httpd-2.0.54/server/connection.c":176, 0x100a4568] 8 child_main(child_num_arg = 0) ["/tmp/httpd-2.0.54/server/mpm/prefork/prefork.c":610, 0x1007c444] 9 make_child(s = 0x102941c0, slot = 0) ["/tmp/httpd-2.0.54/server/mpm/prefork/prefork.c":704, 0x1007c6ac] 10 startup_children(number_to_start = 5) ["/tmp/httpd-2.0.54/server/mpm/prefork/prefork.c":722, 0x1007c75c] 11 ap_mpm_run(_pconf = 0x1025aac0, plog = 0x1028cb88, s = 0x102941c0) ["/tmp/httpd-2.0.54/server/mpm/prefork/prefork.c":941, 0x1007cda0] 12 main(argc = 3, argv = 0x7fff2f44) ["/tmp/httpd-2.0.54/server/main.c":618, 0x100b1c48] 13 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 0x1004b9e8]
First email sent to tomcat-dev: http://marc.theaimsgroup.com/?l=tomcat-dev&m=112501659012202&w=2 Another user has reported what appears to be the exact same crash on Solaris. I am guessing that this is either some sort of 64-bit or big endian bug that does not show up on i386 (where mod_jk 1.2.14.1 works fine for me) http://marc.theaimsgroup.com/?l=tomcat-user&m=112569118927613&w=2
Hi, Can you comment lines 605 and 606 in jk_lb_worker.c and see if it still core dumps. Also, did you try to stop the previous version and delete .shm file? Nou sure, but even reboot might be required if OS catched the shared memory. The shared memory slot was enlarged with 1.2.14 version so the stuctures are different, and if old one are catched then the new one will core dump.
I can confirm that I fully shut down Apache before installing the new mod_jk and starting Apache back up. Removing the mod_jk.shm after shutting down, installing the new module and starting back up doesn't make a difference. Using ipcs to view shared memory after Apache shutdown shows that Apache has correctly released the shared memory that was in use. Hope to try commenting out the lines in the next day or 2.
Fixed in the CVS. This was really strange. Seems that shared memory gets corrupted if 64 bit access is desired. Can you try the current HEAD?
Checked out from CVS today, and can confirm that the new build appears to work properly. Thanks!
Created attachment 16424 [details] Correct misalignment There is an alignment problem in the shared memory. The bug only shows up, when gcc is used with "-O" or "-O2".
I reopen the bug, because: - I think the above patch will resolve the problem and still allow to use 64 Bit counters - I think that without the patch there might result further cores when the members of the structs in the shared memory are changed in the future (even without 64 Bit members)
Mladen Turk applied the patch. Thanks!
I just tried compiling on IRIX using latest CVS, but the modified jk_shm.c does not compile using the SGI CC compiler: "jk_shm.c", line 50: warning(1040): expected an identifier Apparently the compiler doesn't like the union inside of the struct. Removing the union makes it compile and seems to function OK as well (sorry, even though my C is a bit rusty my change sure looks like a hack!): diff -u -r1.20 jk_shm.c --- common/jk_shm.c 16 Sep 2005 05:52:26 -0000 1.20 +++ common/jk_shm.c 17 Sep 2005 09:43:10 -0000 @@ -43,10 +43,8 @@ /** jk shm header record structure */ struct jk_shm_header { - union { - jk_shm_header_data_t data; - char alignbuf[JK_SHM_ALIGN(sizeof(jk_shm_header_data_t))]; - }; + jk_shm_header_data_t data; + char alignbuf[JK_SHM_ALIGN(sizeof(jk_shm_header_data_t))]; char buf[1]; };
The compiler probably does not like un named unions rather then unions itself. Can you try something like: union { ... ... } h; Of course you will need to change the jk_shm.c code for each 'hdr->data.XXX' to the 'hdr->h.data.XXX' Tell me if that helps.
Created attachment 16437 [details] unnamed-union.patch Yes, that appears to compile and run OK. Here is the attached diff I used against jk_shm.c.
Hi, I have commited the patch. Can you double check with the current HEAD and close the issue if all is working. Thanks.
CVS head from earlier today works for me on SGI IRIX and Fedora Core 4. Thanks!
The user in http://marc.theaimsgroup.com/?l=tomcat-user&m=112569118927613&w=2 reported, that CVS fixes the bug for him too.