Bug 60511 - org.apache.coyote.ajp.AjpNio2Protocol sends wrong data to Apache MOD_JK connector causing connection closure and multiple reconnects
Summary: org.apache.coyote.ajp.AjpNio2Protocol sends wrong data to Apache MOD_JK conne...
Status: RESOLVED WORKSFORME
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.0.38
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-22 13:57 UTC by Areg Vrtanesyan
Modified: 2018-01-17 08:08 UTC (History)
1 user (show)



Attachments
sanitised mod_jk debug log (142.33 KB, text/plain)
2016-12-22 13:57 UTC, Areg Vrtanesyan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Areg Vrtanesyan 2016-12-22 13:57:56 UTC
Created attachment 34548 [details]
sanitised mod_jk debug log

Hi Team

I have been working with Atlassian and in Jira we discovered that "org.apache.coyote.ajp.AjpNio2Protocol" in Tomcat causing troubles.

-------------------------------------------------------------------------------
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] map_uri_to_worker_ext::jk_uri_worker_map.c (1185): Attempting to map URI 'sanitised_url' from 4 maps
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] find_match::jk_uri_worker_map.c (980): Attempting to map context URI '/*=jira-pro0-lb' source 'JkMount'
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] find_match::jk_uri_worker_map.c (993): Found a wildchar match '/*=jira-pro0-lb'
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_handler::mod_jk.c (2823): Into handler jakarta-servlet worker=jira-pro0-lb r->proxyreq=0
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] wc_get_worker_for_name::jk_worker.c (120): found a worker jira-pro0-lb
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] wc_get_name_for_type::jk_worker.c (304): Found worker type 'lb'
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] init_ws_service::mod_jk.c (1196): Service protocol=HTTP/1.1 method=GET ssl=true host=(null) addr=nn.nn.nn.nn name=jira.company.com port=443 auth=(null) user=(null) laddr=nn.n.nnn.nn raddr=nn.nn.nn.nn uaddr=nn.nn.nn.nn uri=sanitised_url
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] service::jk_lb_worker.c (1270): service sticky_session=1 id='7B8F781E9D9786ABFD7408CFC8E82667.jira-pro0-node2'
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] get_most_suitable_worker::jk_lb_worker.c (1078): searching worker for partial sessionid 7B8F781E9D9786ABFD7408CFC8E82667.jira-pro0-node2
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] get_most_suitable_worker::jk_lb_worker.c (1086): searching worker for session route jira-pro0-node2
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] get_most_suitable_worker::jk_lb_worker.c (1101): found worker jira-pro0-n2 (jira-pro0-node2) for route jira-pro0-node2 and partial sessionid 7B8F781E9D9786ABFD7408CFC8E82667.jira-pro0-node2
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] service::jk_lb_worker.c (1315): service worker=jira-pro0-n2 route=jira-pro0-node2 failover=false
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_get_endpoint::jk_ajp_common.c (3356): (jira-pro0-n2) acquired connection pool slot=0 after 0 retries
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_marshal_into_msgb::jk_ajp_common.c (684): (jira-pro0-n2) ajp marshaling done
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_service::jk_ajp_common.c (2591): processing jira-pro0-n2 with 2 retries
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_shutdown_socket::jk_connect.c (932): About to shutdown socket 38 [nn.n.nnn.nn:47243 -> nn.n.nnn.nn:8600]
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_is_input_event::jk_connect.c (1411): error event during poll on socket 38 [nn.n.nnn.nn:47243 -> nn.n.nnn.nn:8600] (event=16)
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_shutdown_socket::jk_connect.c (1016): Shutdown socket 38 [nn.n.nnn.nn:47243 -> nn.n.nnn.nn:8600] and read 0 lingering bytes in 0 sec.
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_send_request::jk_ajp_common.c (1639): (jira-pro0-n2) failed sending request, socket 38 is not connected any more (errno=0)
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_abort_endpoint::jk_ajp_common.c (821): (jira-pro0-n2) aborting endpoint with socket 38
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] ajp_send_request::jk_ajp_common.c (1717): (jira-pro0-n2) no usable connection found, will create a new one, detected by connect check (1), cping (0), send (0).
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_open_socket::jk_connect.c (675): socket TCP_NODELAY set to On
[Wed Dec 21 15:38:05.939 2016] [22617:140683748223040] [debug] jk_open_socket::jk_connect.c (799): trying to connect socket 38 to nn.n.nnn.nn:8600
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] jk_open_socket::jk_connect.c (825): socket 38 [nn.n.nnn.nn:47284 -> nn.n.nnn.nn:8600] connected
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): sending to ajp13 pos=4 len=2291 max=8192
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0000    xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx  - xxxxxxxxxxxxxxxx
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0010    xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx  - xxxxxxxxxxxxxxxx
....
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03e0    xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx  - xxxxxxxxxxxxxxxx
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03f0    xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx  - xxxxxxxxxxxxxxxx
[Wed Dec 21 15:38:05.940 2016] [22617:140683748223040] [debug] ajp_send_request::jk_ajp_common.c (1782): (jira-pro0-n2) request body to send 0 - request body to resend 0
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): received from ajp13 pos=0 len=2 max=8192
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): 0000    05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  - ................
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [warn] ajp_process_callback::jk_ajp_common.c (2121): (jira-pro0-n2) AJP13 protocol: Reuse is set to false
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] ajp_reset_endpoint::jk_ajp_common.c (851): (jira-pro0-n2) resetting endpoint with socket 38 (socket shutdown)
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] ajp_abort_endpoint::jk_ajp_common.c (821): (jira-pro0-n2) aborting endpoint with socket 38
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] jk_shutdown_socket::jk_connect.c (932): About to shutdown socket 38 [nn.n.nnn.nn:47284 -> nn.n.nnn.nn:8600]
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] jk_is_input_event::jk_connect.c (1411): error event during poll on socket 38 [errno=107] (event=16)
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] jk_shutdown_socket::jk_connect.c (1016): Shutdown socket 38 [nn.n.nnn.nn:47284 -> nn.n.nnn.nn:8600] and read 39 lingering bytes in 0 sec.
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] ajp_done::jk_ajp_common.c (3287): recycling connection pool for worker jira-pro0-n2 and socket -1
[Wed Dec 21 15:38:05.941 2016] [22617:140683748223040] [debug] jk_handler::mod_jk.c (2979): Service finished with status=200 for worker=jira-pro0-lb
-------------------------------------------------------------------------------

The part of sanitised log is attached.

This was really caused big problems to us and we rolled back to previous AJP connector - "org.apache.coyote.ajp.AjpNioProtocol".

This may be related to bug 57674, bug 58970 or bug 42269

Regards,
Areg
Comment 1 Remy Maucherat 2016-12-22 14:20:45 UTC
Ok, the client reports an issue, but it is impossible to look into it with only this. It is unlikely anyone will look at that issue in the Tomcat 8.0 branch, as ongoing support is expected to move to 8.5.
Comment 2 Areg Vrtanesyan 2016-12-22 22:24:56 UTC
(In reply to Remy Maucherat from comment #1)
> Ok, the client reports an issue, but it is impossible to look into it with
> only this. It is unlikely anyone will look at that issue in the Tomcat 8.0
> branch, as ongoing support is expected to move to 8.5.

Hi Remy

Are you suggesting that Atlassian should have a look to uplift Tomcat version in use?

From another point of view could this bug be present in latest 8.5 version too?

What I can do to help? 

I was suggested to use NIO2 connector and results was quite not pleasant. 

Regards, 
Areg
Comment 3 Rainer Jung 2016-12-22 22:54:57 UTC
Just in case this is not clear: it seems sometimes the request is immediately answered with an AJP packet with signature "05", which is an "End Response" packet. No response headers, just "End Response".

Of course it could be, that the request was invalid and the connector immediately closed the connection.

Looking at the AJP connector code, the endAndCloseMessageArray was sent in finish() and getErrorState().isError() was true.

I'd agree, that it would be helpful to reproduce the problem with a current version and to try to provide a self-contained reproduction example. From your full log it seems that sometimes correct responses come back, so we need to understand what properties the failing requests distinguish from the good ones.
Comment 4 Areg Vrtanesyan 2016-12-22 23:35:01 UTC
Not sure if I can do it on my side. 
I will work with Atlassian to see if they can do that.
Comment 5 Remy Maucherat 2016-12-23 09:11:18 UTC
Please leave as NEEDINFO as we need info to do anything about this issue.

I don't see why you need to interact with Atlassian. First thing to do is to test with 8.5.9 to see if the connector refactoring changed something that could then be backported in some way.
Comment 6 Remy Maucherat 2018-01-17 08:08:18 UTC
No further activity. Testing 8.5/9.0 would be a good idea.