Created attachment 37742 [details] java.io.IOException stack example Hello, Investigating a download issue on several copies of our application (the application is deployed on hundreds of tomcats, each application is independent, no clustering, just different client), we found out that we have thousands of errors such as the one attached. We are using tomcat 9.0.41 and 9.0.43 but I think this issue is much older than that (we only keep 3 months of logs). The JDK version used is either OpenJDK 8 or 11 from AdoptOpenJDK (the JDK version doesn't change anything). The installation of tomcat is from the official website and we are compiling APR automatically following the documentation on each tomcat release. The connector configuration is: ==================================================================== <Connector protocol="org.apache.coyote.http11.Http11AprProtocol" port="8080" maxThreads="500" acceptCount="2500" maxPostSize="10485760" maxKeepAliveRequests="-1" useSendfile="false" compression="on" compressibleMimeType="text/html,text/plain,text/xml,text/css,text/csv,text/rtf,text/richtext,text/javascript,application/x-javascript,application/javascript,application/ecmascript,application/rss+xml,application/xml,application/json,application/wasm,application/prs.hal-forms+json" compressionMinSize="250" connectionTimeout="20000" > <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" useSendfile="false" compression="on" compressibleMimeType="text/html,text/plain,text/xml,text/css,text/csv,text/rtf,text/richtext,text/javascript,application/x-javascript,application/javascript,application/ecmascript,application/rss+xml,application/xml,application/json,application/wasm,application/prs.hal-forms+json" compressionMinSize="250" /> </Connector> ==================================================================== The service is behind an haproxy which handle TLS. Java parameters varies for memory but other than that, this is a typical setenv.sh: ==================================================================== #!/bin/sh # Ansible managed export JAVA_HOME="/srv/resources/jvm/jdk-11.0.10+9" export CATALINA_HOME="/applications/tomcat/engine" export compilation="-XX:-TieredCompilation" export logs="-Xlog:gc*:file=${CATALINA_HOME}/logs/gc.log::filecount=10,filesize=20m:time,tid,level,tags" export timeouts="-Dsun.net.client.defaultConnectTimeout=300000 -Dsun.net.client.defaultReadTimeout=7200000" export FlightRecording="-XX:StartFlightRecording=disk=true,dumponexit=true,filename=${CATALINA_HOME}/logs" export JAVA_OPTS="-Djava.awt.headless=true -Dcom.sun.security.enableAIAcaIssuers=true -XX:+UseG1GC -XX:MaxGCPauseMillis=800 -XX:+UseStringDeduplication -Djava.net.preferIPv4Stack=true -Xmx64g -XX:ReservedCodeCacheSize=2g -XX:MetaspaceSize=10g -XX:G1HeapRegionSize=32M ${compilation} ${logs} ${timeouts} ${FlightRecording}" # JMX monitoring export CATALINA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/applications/tomcat/settings/jmxremote/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/applications/tomcat/settings/jmxremote/jmxremote.access -Djava.rmi.server.hostname=REDACTED" # Activate Apache Portable Runtime (APR) based Native library for Tomcat export LD_LIBRARY_PATH=/usr/local/apr/lib ==================================================================== Changing the connector from "org.apache.coyote.http11.Http11AprProtocol" to "org.apache.coyote.http11.Http11Nio2Protocol" completely eliminate the issue, we see 0 error. Reproducing the issue is complicated and we have no automatic way to trigger this yet. The issue https://bz.apache.org/bugzilla/show_bug.cgi?id=62626 looks similar and we are indeed using imageio (when we use JDK 8, we are using jai-imageio-core-1.4.0, when we use JDK 11, we use the included imageio). I'm sorry but I may not be able to help you further since were are switching everthing to nio2 now, but I still wanted to raise the issue so that you know that such problem still exist.
Error code 104 is "Connection reset by peer". Therefore, this is not a Tomcat issue. I recommend NIO rather than NIO2 unless performance testing has indicated a clear benefit of using NIO2. Further assistance, if required, can be obtained from the users mailing list.
We also have "Unexpected error [32]". And how do you explain that the switch from APR does eliminate the issue? Currently I have 2 tomcats with the exact same configuration except one is using APR and the other one is using NIO2 and I see the errors only on the first one. Also the tomcat that did have issues last week with APR doesn't have any issue today with NIO2. Anyway, thank you for pointing us to NIO, we'll check it out. Of course, you can disregard this report, we'll keep future issues to our self if they don't interest you. I thought that it would be interesting to know that such problem existed, but it seems you prefer to consider that it's simply a "Connection reset by peer", OK.
This is the result on one application: # for n in {01..22}; do echo -n "log.2021-02-${n}_09.log "; grep "Unexpected error" log.2021-02-${n}_09.log | wc -l; done log.2021-02-01_09.log 6144 log.2021-02-02_09.log 5469 log.2021-02-03_09.log 5476 log.2021-02-04_09.log 4653 log.2021-02-05_09.log 5471 log.2021-02-06_09.log 4348 log.2021-02-07_09.log 0 (Sunday, almost 0 activity) log.2021-02-08_09.log 2997 log.2021-02-09_09.log 4523 log.2021-02-10_09.log 4709 log.2021-02-11_09.log 4291 log.2021-02-12_09.log 5725 log.2021-02-13_09.log 3398 log.2021-02-14_09.log 7 (Sunday, almost 0 activity) log.2021-02-15_09.log 4216 log.2021-02-16_09.log 4078 log.2021-02-17_09.log 4064 log.2021-02-18_09.log 4112 (switch from APR to NIO2) log.2021-02-19_09.log 0 log.2021-02-20_09.log 0 log.2021-02-21_09.log 0 log.2021-02-22_09.log 0 Of course, you can say that switching to NIO2 did nothing, but I think that it did.
Error code 104 is a client disconnect. That is never a Tomcat bug. Error code 32 is a broken pipe. That error code was never mentioned in the original bug report. The Tomcat Native/APR code does not use pipes as far as I can tell although it is possible I am missing something under the covers in either the APR library or OpenSSL. If you have a test case that reproduces the 32 error on a clean install of the latest Tomcat 9 and Native Connector then please open a new bug report and provide the steps to reproduce the error and any additional information in that new bug report.
If we're comparing the connectors as of right now, I would say I prefer NIO2 over NIO on Java 8, since Oracle was not fixing or improving NIO then. Now with the next LTS coming up, I would say NIO is now better since it has fixes/refactorings and new features [NIO2 does not have the new features]. On clients I would say NIO2 is always better as long as the new features are not needed. So I would have, if things don't change: Java 8: NIO2 > NIO Java 11: tie Java 17: NIO > NIO 2 This is not making things better overall ...