Created attachment 37988 [details] Test harness + logs When httpd is used as a reverse proxy to WildFly, it sometimes hangs when a Java client tries to connect to WildFly via jboss-remoting (EJB calls). This works fine in httpd version 2.4.46. This is an intermittent issue (in my testing, more than 50% of the connections hang). How to reproduce: 1. Deploy the attached EAR file (simple-ear.ear) in WildFly 13. The EAR contains a single EJB that will be accessed by the client. 2. Configure httpd as a reverse proxy to WildFly (httpd should handle HTTP protocol upgrade to jboss-remoting): ProxyPass "/" "ws://localhost:1026/" upgrade=jboss-remoting A sample configuration file is attached (httpd.conf) 3. Run the attached standalone Java client that attempts to invoke an EJB in WildFly via httpd. Before running the client, you need to copy the WildFly client library (jboss-client.jar) to the lib folder (this is not included in the archive due to its size). cd echo-client-wf13 java -cp ./bin:./lib/jboss-client.jar:./lib/simple-ejbClient.jar com.microfocus.echoclient.EchoClient "remote+https" "localhost" "4433" Port 4433 is SSL enabled. For clear text communication use protocol as "remote+http" and port as "80". The client requires Java 8 or above (testing was done with Java 8). When it works, the client displays the following: Aug 25, 2021 2:54:46 PM org.wildfly.security.Version <clinit> INFO: ELY00001: WildFly Elytron version 1.3.3.Final Aug 25, 2021 2:54:46 PM org.wildfly.naming.client.Version <clinit> INFO: WildFly Naming version 1.0.9.Final Aug 25, 2021 2:54:47 PM org.xnio.Xnio <clinit> INFO: XNIO version 3.6.3.Final Aug 25, 2021 2:54:47 PM org.xnio.nio.NioXnio <clinit> INFO: XNIO NIO Implementation Version 3.6.3.Final Aug 25, 2021 2:54:47 PM org.jboss.threads.Version <clinit> INFO: JBoss Threads version 2.3.2.Final Aug 25, 2021 2:54:47 PM org.jboss.remoting3.EndpointImpl <clinit> INFO: JBoss Remoting version 5.0.7.Final Aug 25, 2021 2:54:47 PM org.jboss.ejb.client.EJBClient <clinit> INFO: JBoss EJB Client version 4.0.10.Final Result from EchoService: hello When it doesn't work, the client hangs and the last line from above is not displayed. The httpd error logs (LogLevel trace8 + dumpio) for the case when the call hangs are in: error_log_blocked.txt The last log entry in this case is: Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599] proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling (client=31, origin=31) tcpdump on the WildFly port shows that after the protocol upgrade is accepted by WildFly (status code 101), the jboss-remoting conversation begins by WildFly sending another TCP packet to httpd. This packet is acknowledged by the httpd socket but the data is not visible in httpd (the polling does not return). See tcpdump1026-blocked.txt The httpd error logs for the case when the call works are in: error_log_ok.txt The corresponding tcpdump is in tcpdump1026-ok.txt The issue does not reproduce when the following directive is added to httpd conf: "ProxyWebsocketFallbackToProxyHttp Off". So it seems that when the protocol upgrade is handled by mod_proxy_wstunnel everything is working as expected. Also, the issue only reproduces with WildFly 13. The same scenario executed against WildFly 20 does not reproduce the issue. However this seems like a race condition so it might reproduce even with WildFly 20 granted that more testing is performed.
(In reply to Marius Tantareanu from comment #0) > Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599] > proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling > (client=31, origin=31) This call never returns? No timeout at some point?
Also on which platform is httpd running, Linux?
(In reply to Yann Ylavic from comment #1) > (In reply to Marius Tantareanu from comment #0) > > Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599] > > proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling > > (client=31, origin=31) > > This call never returns? No timeout at some point? The call times out after 60 minutes. But I need to check if the timeout is triggered by httpd or by the WildFly client. I will try to reproduce and send the logs for a timed out call.
(In reply to Yann Ylavic from comment #2) > Also on which platform is httpd running, Linux? Yes, Linux. I tested on the following platforms: Red Hat Enterprise Linux Server release 7.5 (Maipo) SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 2
Created attachment 38003 [details] Update data_in_input_filters in the core and ssl input filters The "101 Switching Protocols" response header and the first following jboss-remoting data may be read at the same time by httpd, leading to the application data being buffered in the filters chain and the tunneling loop not noticing. Can you try this patch which should address this potential issue?
Created attachment 38004 [details] Issue a nonblocking instead of ap_filter_input_pending() Actually a simpler/better patch might be this one, please try it instead.
(In reply to Marius Tantareanu from comment #3) > (In reply to Yann Ylavic from comment #1) > > (In reply to Marius Tantareanu from comment #0) > > > Wed Aug 25 19:25:54.299051 2021] [proxy:trace8] [pid 31599] > > > proxy_util.c(4800): [client 127.0.0.1:58402] proxy: HTTP: polling > > > (client=31, origin=31) > > > > This call never returns? No timeout at some point? > > The call times out after 60 minutes. But I need to check if the timeout is > triggered by httpd or by the WildFly client. I will try to reproduce and > send the logs for a timed out call. It seems that the 60 minutes timeout is generated by httpd. I am attaching the error_log for this case (error_log_timeout).
Created attachment 38005 [details] error_log capturing the pooling timeout after 60 minutes
(In reply to Yann Ylavic from comment #6) > Created attachment 38004 [details] > Issue a nonblocking instead of ap_filter_input_pending() > > Actually a simpler/better patch might be this one, please try it instead. Thanks! I will give it a try and get back to you.
With the patch in place, I am not able to reproduce the issue anymore. Thanks for the quick turnaround.
Fixed in 2.4.49 (r1892971).