Bug 49929 - AJP APR connector to mod_jk can result in SEND_BODY_CHUNK message after END_RESPONSE message
Summary: AJP APR connector to mod_jk can result in SEND_BODY_CHUNK message after END_R...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 6.0.29
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-14 11:52 UTC by brett.dellegrazie
Modified: 2011-08-15 10:42 UTC (History)
1 user (show)



Attachments
testcase: Servlet for writing directly to output stream (1.32 KB, text/plain)
2010-09-20 09:07 UTC, brett.dellegrazie
Details

Note You need to log in before you can comment on or make changes to this bug.
Description brett.dellegrazie 2010-09-14 11:52:20 UTC
Hi,

Environment:
OS: RHEL 5.5 (fully patched)
Apache HTTPD: 2.2.3 (from OS vendor)
mod_jk: 1.2.30 (downloaded from Tomcat site and compiled locally)
Tomcat: 6.0.29 (binary distribution from apache.org)
JVM: 1.6.0_21 (Sun, 64-bit).

When running httpd/mod_jk in trace mode we are seeing one or two
SEND_BODY_CHUNK messages of length 4 (referred to as a 'flush' message
from now on) just before the END_RESPONSE message.  I see this even when
querying the Tomcat Manager (manager/html) home page so this is not our
application specifically.

Our problem is that in a servlet that retrieves a file that is then sent
to the client we are receiving a 'flush' message _after_ the
END_RESPONSE message. On the next request, mod_jk sees this a breach in
the protocol, closes the socket and results in poor performance and
missed requests.  The files are significantly bigger than the standard
8k buffers (~500kb).

The relevant servlet is a Spring Controller (some details omitted) that
does the following:

public ModelAndView handle(HttpServletRequest request,
HttpServletResponse response, Object command, BindException errors)
throws Exception {
        ...
        ... image is a byte buffer containing an image ...
        response.setContentType(image.getMimeType());
        response.setContentLength(image.getImage().length);
        FileCopyUtils.copy( image.getImage(), response.getOutputStream();

        return null;
}

We were using a manual explicit flush() and close() on the output stream
instead of using FileCopyUtils. When we did that, we got two 'flush'
messages but one was still after the END_RESPONSE.

FileCopyUtils closes the OutputStream automatically but doesn't flush it
as we were doing previously doing manually.

For example:
Full trace level logs available at: http://pastebin.com/KsHeXT58

home page requests (/), result in a 301 to /home.action.  This produces
a double 'flush' before the END_REQUEST message (only debug level
shown):

[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (660): status = 301
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (667): Number of headers is = 5
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] =
[JSESSIONID=FA47238C8C9976E37793701F10A2D923.jvm1; Path=/]
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[1] [Location] =
[home.action]
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[2] [Connection] =
[close]
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[3] [Content-Type]
= [text/html]
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[4]
[Content-Length] = [4]

Then comes the body content:
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=8 max=8192
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    03 00 04
0D 0A 0D 0A 00 00 00 00 00 00 00 00 00  - ................
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ws_write::mod_jk.c (507): written 4 out of 4

1st flush message

[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=4 max=8192
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    03 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

2nd flush message

[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=4 max=8192
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    03 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

Normal END_MESSAGE

[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=2 max=8192
[Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    05 01 00
00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

The home page retrieval proceeds in exactly the same manner (including
the two 'flush' messages.

However when we try this against the file retrieval servlet above
(remember files are ~500k) we get the following behaviour instead:

Headers received by mod_jk from the request:
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_unmarshal_response::jk_ajp_common.c (660): status = 200
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_unmarshal_response::jk_ajp_common.c (667): Number of headers is = 2
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Content-Type]
= [image/jpeg]
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_unmarshal_response::jk_ajp_common.c (723): Header[1]
[Content-Length] = [532874]

Now the data ...
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=8188 max=8192
[Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    03 1F F8
FF D8 FF E0 00 10 4A 46 49 46 00 01 01  - .........JFIF...

... lots of messages like that above ...

Finally once complete (last data write shown):

[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0390    A5 29 4A
FF D9 00 00 00 00 00 00 00 00 00 00 00  - .)J.............
[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ws_write::mod_jk.c (507): written 914 out of 914

The end of the request (note - no flush message at all)

[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=2 max=8192
[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    05 01 00
00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

Connection now supposedly reset for reuse:

[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_process_callback::jk_ajp_common.c (1940): AJP13 protocol: Reuse is
OK
[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_reset_endpoint::jk_ajp_common.c (757): (jvm1) resetting endpoint
with sd = 17 
[Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug]
ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for
worker jvm1
[Mon Sep 13 11:47:38.534 2010] [31728:1147218240] [debug]
jk_handler::mod_jk.c (2602): Service finished with status=200 for
worker=balancer

The problem occurs the next time that connection is used:

[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
map_uri_to_worker_ext::jk_uri_worker_map.c (1036): Attempting to map URI
'/scripts/css/jquery-ui-1.8.2.custom.css' from 4 maps
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
find_match::jk_uri_worker_map.c (850): Attempting to map context URI
'/*=balancer' source 'JkMount'
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
find_match::jk_uri_worker_map.c (863): Found a wildchar match
'/*=balancer'
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
jk_handler::mod_jk.c (2462): Into handler jakarta-servlet
worker=balancer r->proxyreq=0
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
wc_get_worker_for_name::jk_worker.c (116): found a worker balancer
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
wc_get_name_for_type::jk_worker.c (293): Found worker type 'lb'
[Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug]
init_ws_service::mod_jk.c (978): Service protocol=HTTP/1.1 method=GET
ssl=false host=(null) addr=10.16.0.71 name=10.16.0.40 port=80
auth=(null) user=(null) laddr=10.16.0.40 raddr=10.16.0.71
uri=/scripts/css/jquery-ui-1.8.2.custom.css
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
service::jk_lb_worker.c (1118): service sticky_session=1
id='FA47238C8C9976E37793701F10A2D923.jvm1'
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
get_most_suitable_worker::jk_lb_worker.c (946): searching worker for
partial sessionid FA47238C8C9976E37793701F10A2D923.jvm1
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
get_most_suitable_worker::jk_lb_worker.c (954): searching worker for
session route jvm1
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
get_most_suitable_worker::jk_lb_worker.c (968): found worker jvm1 (jvm1)
for route jvm1 and partial sessionid
FA47238C8C9976E37793701F10A2D923.jvm1
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
service::jk_lb_worker.c (1161): service worker=jvm1 route=jvm1
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_get_endpoint::jk_ajp_common.c (3093): acquired connection pool
slot=0 after 0 retries
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_marshal_into_msgb::jk_ajp_common.c (605): ajp marshaling done
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_service::jk_ajp_common.c (2376): processing jvm1 with 2 retries

begin by sending 'cping', we expect 'cpong'

[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_connection_tcp_send_message::jk_ajp_common.c (1152): sending to
ajp13 pos=4 len=5 max=16
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_connection_tcp_send_message::jk_ajp_common.c (1152): 0000    12 34
00 01 0A 00 00 00 00 00 00 00 00 00 00 00  - .4..............

But we receive ... SEND_BODY_CHUNK of leght 4, i.e. client flush (!)

[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from
ajp13 pos=0 len=4 max=16
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000    03 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

Oops - better close stuff etc. From here things just shutdown

[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [warn]
ajp_handle_cping_cpong::jk_ajp_common.c (906): awaited reply cpong,
received 3 (0 / 5) instead. Closing connection
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 17
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
jk_shutdown_socket::jk_connect.c (803): shutting down the read side of
socket 17
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 17 and read 5
lingering bytes in 0 sec.
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [info]
ajp_send_request::jk_ajp_common.c (1518): (jvm1) failed sending request,
socket -1 prepost cping/cpong failure (errno=0)
[Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [info]
ajp_send_request::jk_ajp_common.c (1574): (jvm1) all endpoints are
disconnected, detected by connect check (0), cping (1), send (0)

Thus resulting in broken behaviour because left-over flush method was
present in queue.

Has anyone seen this before? Can anyone help?

Thanks,

Additional information:

workers.properties:
worker.list=jk-status
worker.jk-status.type=status
worker.jk-status.read_only=true
worker.list=jk-manager
worker.jk-manager.type=status
worker.list=balancer
worker.balancer.type=lb
worker.balancer.error_escalation_time=0
worker.balancer.max_reply_timeouts=10
worker.balancer.balance_workers=jvm1
worker.balancer.balance_workers=jvm2
worker.jvm2.reference=worker.template
worker.jvm2.host=fmp-dun-tapp2
worker.jvm2.port=10303
worker.jvm2.activation=A

# Used for jkmanager / status
worker.list=jvm1
worker.jvm1.reference=worker.template
worker.jvm1.host=localhost
worker.jvm1.port=10303
worker.jvm1.activation=A

worker.template.type=ajp13
worker.template.socket_connect_timeout=5000
worker.template.socket_keepalive=true
worker.template.ping_mode=A
worker.template.ping_timeout=10000
worker.template.connection_pool_minsize=0
worker.template.connection_pool_timeout=600
worker.template.reply_timeout=300000
worker.template.recovery_options=3

mod_jk.conf:
LoadModule jk_module modules/mod_jk.so

<IfModule jk_module>
    JkWorkersFile components/workers.properties
    JkLogFile "|/usr/sbin/rotatelogs /var/log/httpd/mod_jk_log 86400"
    JkLogLevel info
    JkShmFile logs/mod_jk.shm
    LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\" \"%{Cookie}i\" \"%{Set-Cookie}o\" %{pid}P %{tid}P
%{JK_LB_FIRST_NAME}n %{JK_LB_LAST_NAME}n ACC %{JK_LB_LAST_ACCESSED}n ERR
%{JK_LB_LAST_ERRORS}n BSY %{JK_LB_LAST_BUSY}n %{JK_LB_LAST_STATE}n %D"
extended_jk
    JkStripSession On
    JkWatchdogInterval 60
</IfModule>

Relevant httpd.conf global entries:
Timeout 120
KeepAlive On
MaxKeepAliveRequests 0
KeepAliveTimeout 15

#Worker MPM
StartServers         2
MaxClients         150
MinSpareThreads     25
MaxSpareThreads     75 
ThreadsPerChild     25
MaxRequestsPerChild  0

Relevant vhost entry:
<Directory "/var/www/files/" >
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>

<Directory "/var/www/www.example.com/" >
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>

<VirtualHost _default_:80>
        ServerName www.example.com
        DocumentRoot "/var/www/www.example.com"
        
        ErrorDocument 404 /404.jsp
        
        CacheEnable mem /
        CacheIgnoreHeaders Set-Cookie
        # Not available: CacheIgnoreURLSessionIdentifiers jsessionid

        # These files should be made part of the static build structure.
        Alias /files/ "/var/www/files/"
        
        JkMount /|* balancer
        JkUnMount /manager/* balancer
        JkUnMount /files/* balancer
        JkStripSession on
</VirtualHost>

<VirtualHost _default_:443>
        ServerName www.example.com
        DocumentRoot "/var/www/www.example.com"
        
        ErrorDocument 404 /404.jsp

        CacheEnable mem /
        CacheIgnoreHeaders Set-Cookie
        # Not available: CacheIgnoreURLSessionIdentifiers jsessionid

        Alias /files/ "/var/www/files/"
                
        SSLEngine on
        
        SSLProtocol All -SSLv2
        SSLCipherSuite ALL:!aNULL:!ADH:!eNULL:!LOW:!EXP:RC4+RSA:+HIGH:
+MEDIUM
        SSLOptions +OptRenegotiate
        
        SSLCertificateFile /etc/pki/tls/certs/server.crt
        SSLCertificateKeyFile /etc/pki/tls/key/server.key
        SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt
        SSLCACertificateFile /etc/pki/tls/certs/ca-bundle.crt 
        
        SSLVerifyClient none
        
        SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire
        <Files ~ "\.(cgi|shtml|phtml|php3?)$">
            SSLOptions +StdEnvVars
        </Files>
        <Directory "/var/www/cgi-bin">
            SSLOptions +StdEnvVars
        </Directory>
        
        SetEnvIf User-Agent ".*MSIE.*" \
                 nokeepalive ssl-unclean-shutdown \
                 downgrade-1.0 force-response-1.0

        JkMount /|* balancer
        
        JkUnMount /manager/* balancer
        JkUnMount /files/* balancer
        JkStripSession on
</VirtualHost>

Tomcat server.xml

<?xml version='1.0' encoding='utf-8'?>
<Server port="10300" shutdown="SHUTDOWN">

        <Listener
className="org.apache.catalina.core.AprLifecycleListener"
                SSLEngine="on" />
        <Listener className="org.apache.catalina.core.JasperListener" />
        <Listener
className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />
        <Listener
className="org.apache.catalina.mbeans.ServerLifecycleListener" />
        <Listener
className="org.apache.catalina.mbeans.JmxRemoteLifecycleListener"
                rmiRegistryPortPlatform="10301"
                rmiServerPortPlatform="10302"
                useLocalPorts="true" />
        <Listener
className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />

        <GlobalNamingResources>
                <Resource name="UserDatabase" auth="Container"
                        type="org.apache.catalina.UserDatabase"
description="User database
that can be updated and saved"
                        factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
                        pathname="conf/tomcat-users.xml" />

        </GlobalNamingResources>

        <Service name="Catalina">

                <Executor name="ajpThreadPool" namePrefix="tc-ajp-exec-"
                        maxThreads="200" minSpareThreads="4" />
                <Executor name="httpThreadPool" namePrefix="tc-http-exec-"
                        maxThreads="50" minSpareThreads="4" />
                <Connector executor="ajpThreadPool"
enableLookups="false"
                        port="10303"
protocol="org.apache.coyote.ajp.AjpAprProtocol"
                        redirectPort="443" />

                <Connector executor="httpThreadPool"
enableLookups="false"
                        port="10304"
protocol="org.apache.coyote.http11.Http11NioProtocol"
                        connectionTimeout="20000"
redirectPort="10305" />
                <Connector executor="httpThreadPool"
enableLookups="false"
                        port="10305"
protocol="org.apache.coyote.http11.Http11AprProtocol"
                        SSLEnabled="true" scheme="https" secure="true"
connectionTimeout="20000"
                        SSLCACertificateFile="${tomcat.directconn.SSLCACertificateFile}"

SSLCertificateChainFile="${tomcat.directconn.SSLCertificateChainFile}"
                        SSLCertificateFile="${tomcat.directconn.SSLCertificateFile}"
                        SSLCertificateKeyFile="${tomcat.directconn.SSLCertificateKeyFile}"
                        SSLProtocol="TLSv1" />

                <Engine name="Catalina" defaultHost="localhost"
                        jvmRoute="jvm${tomcat.clusterid}">

                        <Cluster
className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                                channelSendOptions="6">

                                <Manager
className="org.apache.catalina.ha.session.BackupManager"
                                        expireSessionsOnShutdown="false"
notifyListenersOnReplication="true"
                                        mapSendOptions="6" />
                                <Channel
className="org.apache.catalina.tribes.group.GroupChannel">
                                        <Membership
className="org.apache.catalina.tribes.membership.McastService"
                                                address="228.0.0.4"
port="45564" frequency="500"
dropTime="3000" />
                                        <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                                                address="auto"
port="5000" selectorTimeout="100" maxThreads="6" />

                                        <Sender

className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                                                <Transport

className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" />
                                        </Sender>
                                        <Interceptor

className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector" />
                                        <Interceptor

className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor" />
                                        <Interceptor

className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor" />
                                </Channel>

                                <Valve
className="org.apache.catalina.ha.tcp.ReplicationValve"
                                        filter=".*\.gif;.*\.js;.*
\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*
\.txt;"
                                        statistics="true" />

                                <Valve
className="org.apache.catalina.ha.session.JvmRouteBinderValve" />

                                <ClusterListener

className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener" />
                                <ClusterListener

className="org.apache.catalina.ha.session.ClusterSessionListener" />
                        </Cluster>

                        <Realm
className="org.apache.catalina.realm.UserDatabaseRealm"
                                resourceName="UserDatabase" />

                        <Host name="localhost" appBase="webapps"
unpackWARs="false"
                                autoDeploy="false" xmlValidation="false"
xmlNamespaceAware="false">
                        </Host>
                        <Host name="aob" appBase="aob"
                                unpackWARs="false" autoDeploy="false"
xmlValidation="false"
                                xmlNamespaceAware="false">
                                <Alias>example2.com</Alias>
                        </Host>
                </Engine>
        </Service>
</Server>
Comment 1 brett.dellegrazie 2010-09-14 11:56:10 UTC
There was a patch supplied by Rainer Jung which we used to get more information out of mod_jk.  It was how we realised there were out-of order messages in the queue on the Tomcat side.

Patch:
--- jk_ajp_common.c     2010-02-23 08:26:02.000000000 +0100
+++ jk_ajp_common.c.debug       2010-09-02 18:24:41.429730000 +0200
@@ -904,9 +904,9 @@
                        cmd < JK_AJP13_SEND_BODY_CHUNK ||
                        cmd > AJP13_CPONG_REPLY) {
                    jk_log(l, JK_LOG_WARNING,
-                       "awaited reply cpong, received %d instead. "
+                       "awaited reply cpong, received %d (%d / %d) 
 instead. "
                        "Closing connection",
-                       cmd);
+                       cmd, i, ae->last_op);
                   /* We can't trust this connection any more. */
                   ajp_abort_endpoint(ae, JK_TRUE, l);
                   JK_TRACE_EXIT(l);
Comment 2 brett.dellegrazie 2010-09-14 11:59:23 UTC
Mladen Turk wrote in email on tomcat-users list [Tue, 14 Sep 2010 07:50:55 +0200]:

For AJP APR connector 'if (actionCode == ActionCode.ACTION_CLIENT_FLUSH)'
doesn't check if the socket was already virtually closed
so I guess that would be a first thing to look at.
We cannot just check for finished flag thought, cause recycle() would reset that.
Instead if there was getOutputSteam() we would need to directly
invalidate that stream. This means we would need to remember that stream
somewhere.

Further more the wrapped socket is 'long' pointing to the
native pointer, so we don't have object notification on
physical socket close (that's another problem that manifests
in JVM core for detached sockets).
Comment 3 Mladen Turk 2010-09-14 12:25:33 UTC
Thanks for filing that. I'll take a look into that.
It would be nice if you could provide a simple jsp or
servlet test case (without third party packages)
to speed up the hacking.
Comment 4 brett.dellegrazie 2010-09-20 09:07:58 UTC
Created attachment 26055 [details]
testcase: Servlet for writing directly to output stream

This is a test servlet for writing simple data directly to the output stream. It should trigger the issue in this bug
Comment 5 Mladen Turk 2010-09-21 13:00:11 UTC
I have tried your test servlet, but I cannot see anything suspicious.
There is no additional packets following the lorem ipsum.
Tested on Tomcat 6.0.29, 6.0.x trunk and Tomcat 7.
Both APR and JIO endpoints behave the same.
Comment 6 brett.dellegrazie 2010-09-21 13:19:54 UTC
Hi,

When using the test servlet the problem appears on the next request when its on the same socket so its a good idea to have a static file around to request immediately after executing the test servlet.  The way we did this was to use Jmeter with only 1 thread but executing the two requests in sequence with no wait time.

With Apache HTTPD you should see mod_jk (in debug mode) detect an explicit flush packet and reset the socket instead of the usual CPONG expected response as reported originally.

Thanks,

Brett
Comment 7 Mladen Turk 2010-09-24 02:51:11 UTC
I'm still not been able to reproduce the stuff.
I don't use jmeter but rather a simple ab which can only hit a
single uri, so this might be the reason why its not manifested there.

BTW, does it happen only with APR connector, or with JIO as well?
Comment 8 Mark Thomas 2010-10-03 16:28:07 UTC
I can't reproduce this with the latest Tomcat 6 code with either the BIO AJP connector or the APR AJP connector. A check of the mod_jk debug log confirms that that cping/cpong works as expected.

I have been using the supplied Servlet and a 46k static file.

Without a test case to reproduce this issue, this bug is heading towards being closed as "works for me" .
Comment 9 brett.dellegrazie 2010-10-04 08:06:20 UTC
(In reply to comment #8)
> I can't reproduce this with the latest Tomcat 6 code with either the BIO AJP
> connector or the APR AJP connector. A check of the mod_jk debug log confirms
> that that cping/cpong works as expected.
> 
> I have been using the supplied Servlet and a 46k static file.
> 
> Without a test case to reproduce this issue, this bug is heading towards being
> closed as "works for me" .

Hi,

I've tried to reproduce this in a test application as well and the problem just doesn't occur.  At present, we can only generate it using our application, sorry.

Don't know how to progress this further now. :(

Regards,

Brett
Comment 10 Mladen Turk 2010-10-04 08:37:44 UTC
The test solution would have to use the same sort of writer
(image in your case I suppose)
We know that image classes can crash the APR connector because
they try to manage the socket by themselves so double close
can occur. This might also cause the SBC to been send after
the EOR since with AJP the physical socket is not closed.

Again without a workable test case it's just a guess.
Comment 11 Mark Thomas 2011-01-05 10:10:23 UTC
Without a reproducible test case this is eventually going to get closed as WONTFIX.
Comment 12 Mladen Turk 2011-02-03 05:25:24 UTC
Can you check if r1066772 fixes the issue?
It's for trunk, but applying it for 6.0.x should be trivial
Comment 13 brett.dellegrazie 2011-02-03 05:32:12 UTC
Hi Mladen,

We switched from using Apache HTTPD to HAproxy late last year because of this issue... therefore it might take me quite a while to verify this.
I'll try, but no guarantees.

Best Regards,

Brett
Comment 14 michael.ballantyne 2011-03-24 22:44:27 UTC
Hey - we just ran into this issue this week, and I finally managed to track it down to this bug after a good two days of work. Gotta get this fixed. I will attempt to both come up with a simple test case and try the provided patch next week.
Comment 15 Mark Thomas 2011-06-29 13:31:48 UTC
I've proposed r1066772 for 6.0.x since it is the right thing to do and it may help fix this issue. Confirmation that it does fix it would still be helpful.
Comment 16 Mark Thomas 2011-08-15 10:42:50 UTC
The fix has been back-ported and absent any reports to the contrary, the assumption is that the issue is fixed.