Bug 46529 - Child httpd processes crash with Segmentation fault
Summary: Child httpd processes crash with Segmentation fault
Status: ASSIGNED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_log_config (show other bugs)
Version: 2.0.59
Hardware: Sun Solaris
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-14 07:46 UTC by Roman Belous
Modified: 2009-09-14 00:52 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Belous 2009-01-14 07:46:38 UTC
error.log of our production system shows "child pid xxx exit signal Segmentation fault (11)" messages from time to time. Unfortunately, it is unclear what exact conditions causes this to happen.

If you need any additional information, I'll provide it.

Logging settings from httpd.conf:
LogLevel warn
LogFormat "%h %l %u %t \"%r\" %>s %b %D \"%{Cookie}i\"" common 
CustomLog "/home/rbelous/logs/apache_access.log" common
ErrorLog "/home/rbelous/logs/error.log"

Stack trace:
0007e9f8 ap_escape_logitem (230e98, b0ca8, ff2e5c98, 2377c4, 22b838, 0) + 84
00034ed4 config_log_transaction (231788, de650, 2, 3cf, dd1a0, de638) + 16c
00035004 multi_log_transaction (231788, 0, ffbff458, 0, cdd54, 19d818) + 4c
00085388 ap_run_log_transaction (231788, 0, 6, 231788, ce210, 227838) + 3c
0005af00 ap_process_request (231788, 230da8, 4, 231788, cdce0, 227890) + d4
00055e94 ap_process_http_connection (227838, 227760, 227760, 25, cdce0, 19da50) + 100
00080ff8 ap_run_process_connection (227838, 227760, 227760, 25, 225798, 22b738) + 3c
00073d54 child_main (cd000, 22b738, cd000, 4e2e, cd000, cdc00) + 46c
00073efc make_child (73800, 25, 0, febe2dd0, 225314, 94) + f0
00073fcc startup_children (28, fedb9fec, 0, 10, 1cf4, a) + 68
00074754 ap_mpm_run (d8ba8, cd000, 4d, cd000, cd000, cd000) + 76c
0007a588 main     (d6c20, d8ba8, ffbff9e0, cd000, cd000, 0) + 678
0002b218 _start   (0, 0, 0, 0, 0, 0) + 5c
Comment 1 Nick Kew 2009-01-14 17:54:57 UTC
Your stack trace doesn't make much sense, and without it the report is too vague to do anything with.  At the top, ap_escape_logitem takes just two args, and I infer the report you have is affected by compiler optimisation.

Can you:
(1) Tell us if you still get a problem with an uptodate apache (2.2.11)?
(2) Try and get some diagnostics from mod_backtrace and possibly mod_whatkilledus, from http://people.apache.org/~trawick/ ?
Comment 2 Roman Belous 2009-01-15 04:56:19 UTC
We cannot reproduce issue in test environment (even with 2.0.59). It only appears in production system. And we are unable to use latest Apache release for production right now.
I'll try to get permission to set up mod_backtrace and mod_whatkilledus there.

Provided stack trace obtained with mdb (Sun Modular Debugger), probably that is the reason of its strange view. Apache was built with the next string:

./configure --enable-ssl --enable-expires --enable-rewrite --enable-so --enable-vhost-alias --enable-deflate --enable-headers --prefix=/export/home/nam_ta/apache/release --enable-ext-filter --enable-proxy

So I'll try to get requested information for now. Will core files help you? I can upload them somewhere.
Comment 3 Roman Belous 2009-01-16 05:24:38 UTC
I've studied core files. Segmentation fault appears on write of escaped string as if there would be not enough memory allocated for it. Starting at the beginning of allocated memory there is the next string:
00010BA6-091E-F0F0F0F0-000010BA6-091E-F0F0F0F0-0.... it goes repeating endlessly right to the end of segment.

I don't know exactly what is this stuff, but it corresponds to %u or %l in our LogFormat. E.g. see the next string from apache_access.log:

10.1.10.226 00010BA6-091E-F0F0F0F0-00000001-01 - [17/Dec/2008:18:19:17 +0100] "GET /ovaServices/services/GetUser.jsp?usr_recid=75800 HTTP/1.1" 401 543 599 "JSESSIONID=abcKlZLZsSY8TeJFVZa5r; ova_platform=adb; ova_retrycount=0; ova_client_id=00:03:91:B2:4F:7A; OVAAuth=YzM1NjViZjJmYjBiNDI2MzNhOTg0YjFiMjg2MDc2ODM0OTQ4ZTMwODc1ODAwIWFyZmxlYmFyZmxlZ2xvb3A="

Any ideas on how this could happen? Looks like strlen returns some certain value for some string and (4 multiplied on that value) bytes are allocated, but iteration through that string never reaches terminating zero byte. And continues to write this string repeatingly.. I'm have too little expertise in C or Asm to understand this.

Sorry, but I still have no possibility to provide results from mod_backtrace or mod_whatkilledus. 

About provided stack trace: you see 6 arguments because of SPARC architecture (6 registers for input parameters) and mdb stack trace representation.
Comment 4 Jeff Trawick 2009-08-21 13:06:19 UTC
Is this still occurring?  It would be nice to see a dump of the memory pointed to by the second parameter to ap_escape_logitem().
Comment 5 Nick Kew 2009-08-22 18:00:12 UTC
Is this still an issue?

There's a casting that looks potentially suspect to me, and a comment in the code suggesting it's there for gcc's benefit, which might not play well with other compilers.  Can you test whether the following patch fixes it for you?

--- server/util.c	(revision 804541)
+++ server/util.c	(working copy)
@@ -1666,7 +1666,7 @@
     what = apr_xlate_conv_byte(ap_hdrs_to_ascii, (unsigned char)what);
 #endif /*APR_CHARSET_EBCDIC*/
     *where++ = prefix;
-    *where++ = c2x_table[what >> 4];
+    *where++ = c2x_table[(what >> 4) & 0xf];
     *where++ = c2x_table[what & 0xf];
     return where;
 }

(you'll get a small offset applying that to 2.0.x).

If that is indeed the issue, then it hasn't been fixed in the current version.
Comment 6 Roman Belous 2009-09-14 00:52:55 UTC
(In reply to comment #5)
> Is this still an issue?
> 
> There's a casting that looks potentially suspect to me, and a comment in the
> code suggesting it's there for gcc's benefit, which might not play well with
> other compilers.  Can you test whether the following patch fixes it for you?
> 
> --- server/util.c    (revision 804541)
> +++ server/util.c    (working copy)
> @@ -1666,7 +1666,7 @@
>      what = apr_xlate_conv_byte(ap_hdrs_to_ascii, (unsigned char)what);
>  #endif /*APR_CHARSET_EBCDIC*/
>      *where++ = prefix;
> -    *where++ = c2x_table[what >> 4];
> +    *where++ = c2x_table[(what >> 4) & 0xf];
>      *where++ = c2x_table[what & 0xf];
>      return where;
>  }
> 
> (you'll get a small offset applying that to 2.0.x).
> 
> If that is indeed the issue, then it hasn't been fixed in the current version.

We removed %l %u from LogFormat on our production system. Issue stopped to reproduce after that. I'm not sure that we will have a chance to test the patch but I'll notify responsible person about it.