Bug 65519 - mod_proxy_http hangs after HTTP upgrade to jboss-remoting
Summary: mod_proxy_http hangs after HTTP upgrade to jboss-remoting
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_http (show other bugs)
Version: 2.4.48
Hardware: PC Linux
: P2 regression (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-25 19:57 UTC by Marius Tantareanu
Modified: 2021-09-20 22:47 UTC (History)
0 users



Attachments
Test harness + logs (25.09 KB, application/x-zip-compressed)
2021-08-25 19:57 UTC, Marius Tantareanu
Details
Update data_in_input_filters in the core and ssl input filters (8.70 KB, patch)
2021-08-30 14:15 UTC, Yann Ylavic
Details | Diff
Issue a nonblocking instead of ap_filter_input_pending() (1.16 KB, patch)
2021-08-30 14:53 UTC, Yann Ylavic
Details | Diff
error_log capturing the pooling timeout after 60 minutes (21.21 KB, text/plain)
2021-08-30 19:38 UTC, Marius Tantareanu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Tantareanu 2021-08-25 19:57:21 UTC
Created attachment 37988 [details]
Test harness + logs

When httpd is used as a reverse proxy to WildFly, it sometimes hangs when a Java client tries to connect to WildFly via jboss-remoting (EJB calls). This works fine in httpd version 2.4.46.
This is an intermittent issue (in my testing, more than 50% of the connections hang).

How to reproduce:

1. Deploy the attached EAR file (simple-ear.ear) in WildFly 13. The EAR contains a single EJB that will be accessed by the client.
2. Configure httpd as a reverse proxy to WildFly (httpd should handle HTTP protocol upgrade to jboss-remoting):

ProxyPass "/"  "ws://localhost:1026/" upgrade=jboss-remoting

A sample configuration file is attached (httpd.conf)

3. Run the attached standalone Java client that attempts to invoke an EJB in WildFly via httpd. Before running the client, you need to copy the WildFly client library (jboss-client.jar) to the lib folder (this is not included in the archive due to its size). 

cd echo-client-wf13
java -cp ./bin:./lib/jboss-client.jar:./lib/simple-ejbClient.jar com.microfocus.echoclient.EchoClient "remote+https" "localhost" "4433"

Port 4433 is SSL enabled. For clear text communication use protocol as "remote+http" and port as "80".

The client requires Java 8 or above (testing was done with Java 8).

When it works, the client displays the following:

Aug 25, 2021 2:54:46 PM org.wildfly.security.Version <clinit>
INFO: ELY00001: WildFly Elytron version 1.3.3.Final
Aug 25, 2021 2:54:46 PM org.wildfly.naming.client.Version <clinit>
INFO: WildFly Naming version 1.0.9.Final
Aug 25, 2021 2:54:47 PM org.xnio.Xnio <clinit>
INFO: XNIO version 3.6.3.Final
Aug 25, 2021 2:54:47 PM org.xnio.nio.NioXnio <clinit>
INFO: XNIO NIO Implementation Version 3.6.3.Final
Aug 25, 2021 2:54:47 PM org.jboss.threads.Version <clinit>
INFO: JBoss Threads version 2.3.2.Final
Aug 25, 2021 2:54:47 PM org.jboss.remoting3.EndpointImpl <clinit>
INFO: JBoss Remoting version 5.0.7.Final
Aug 25, 2021 2:54:47 PM org.jboss.ejb.client.EJBClient <clinit>
INFO: JBoss EJB Client version 4.0.10.Final
Result from EchoService: hello

When it doesn't work, the client hangs and the last line from above is not displayed.

The httpd error logs (LogLevel trace8 + dumpio) for the case when the call hangs are in: error_log_blocked.txt
The last log entry in this case is:

Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599] proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling (client=31, origin=31)

tcpdump on the WildFly port shows that after the protocol upgrade is accepted by WildFly (status code 101), the jboss-remoting conversation begins by WildFly sending another TCP packet to httpd.
This packet is acknowledged by the httpd socket but the data is not visible in httpd (the polling does not return). See tcpdump1026-blocked.txt

The httpd error logs for the case when the call works are in: error_log_ok.txt
The corresponding tcpdump is in tcpdump1026-ok.txt

The issue does not reproduce when the following directive is added to httpd conf: "ProxyWebsocketFallbackToProxyHttp Off". So it seems that when the protocol upgrade is handled by mod_proxy_wstunnel everything is working as expected.

Also, the issue only reproduces with WildFly 13. The same scenario executed against WildFly 20 does not reproduce the issue. However this seems like a race condition so it might reproduce even with WildFly 20 granted that more testing is performed.
Comment 1 Yann Ylavic 2021-08-30 09:52:55 UTC
(In reply to Marius Tantareanu from comment #0)
> Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599]
> proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling
> (client=31, origin=31)

This call never returns? No timeout at some point?
Comment 2 Yann Ylavic 2021-08-30 09:57:39 UTC
Also on which platform is httpd running, Linux?
Comment 3 Marius Tantareanu 2021-08-30 12:14:17 UTC
(In reply to Yann Ylavic from comment #1)
> (In reply to Marius Tantareanu from comment #0)
> > Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599]
> > proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling
> > (client=31, origin=31)
> 
> This call never returns? No timeout at some point?

The call times out after 60 minutes. But I need to check if the timeout is triggered by httpd or by the WildFly client. I will try to reproduce and send the logs for a timed out call.
Comment 4 Marius Tantareanu 2021-08-30 12:20:22 UTC
(In reply to Yann Ylavic from comment #2)
> Also on which platform is httpd running, Linux?

Yes, Linux. I tested on the following platforms:

Red Hat Enterprise Linux Server release 7.5 (Maipo)

SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
Comment 5 Yann Ylavic 2021-08-30 14:15:09 UTC
Created attachment 38003 [details]
Update data_in_input_filters in the core and ssl input filters

The "101 Switching Protocols" response header and the first following jboss-remoting data may be read at the same time by httpd, leading to the application data being buffered in the filters chain and the tunneling loop not noticing.

Can you try this patch which should address this potential issue?
Comment 6 Yann Ylavic 2021-08-30 14:53:15 UTC
Created attachment 38004 [details]
Issue a nonblocking instead of ap_filter_input_pending()

Actually a simpler/better patch might be this one, please try it instead.
Comment 7 Marius Tantareanu 2021-08-30 19:36:57 UTC
(In reply to Marius Tantareanu from comment #3)
> (In reply to Yann Ylavic from comment #1)
> > (In reply to Marius Tantareanu from comment #0)
> > > Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599]
> > > proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling
> > > (client=31, origin=31)
> > 
> > This call never returns? No timeout at some point?
> 
> The call times out after 60 minutes. But I need to check if the timeout is
> triggered by httpd or by the WildFly client. I will try to reproduce and
> send the logs for a timed out call.

It seems that the 60 minutes timeout is generated by httpd. I am attaching the error_log for this case (error_log_timeout).
Comment 8 Marius Tantareanu 2021-08-30 19:38:44 UTC
Created attachment 38005 [details]
error_log capturing the pooling timeout after 60 minutes
Comment 9 Marius Tantareanu 2021-08-30 19:39:45 UTC
(In reply to Yann Ylavic from comment #6)
> Created attachment 38004 [details]
> Issue a nonblocking instead of ap_filter_input_pending()
> 
> Actually a simpler/better patch might be this one, please try it instead.

Thanks! I will give it a try and get back to you.
Comment 10 Marius Tantareanu 2021-08-31 11:42:11 UTC
With the patch in place, I am not able to reproduce the issue anymore. Thanks for the quick turnaround.
Comment 11 Yann Ylavic 2021-09-20 22:47:51 UTC
Fixed in 2.4.49 (r1892971).