Bug 65154 - java.io.IOException: Unexpected error [104] writing data to the APR/native socket
Summary: java.io.IOException: Unexpected error [104] writing data to the APR/native so...
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 9.0.43
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2021-02-22 11:45 UTC by slash
Modified: 2021-02-22 14:03 UTC (History)
0 users

java.io.IOException stack example (12.65 KB, text/plain)
2021-02-22 11:45 UTC, slash

Note You need to log in before you can comment on or make changes to this bug.
Description slash 2021-02-22 11:45:15 UTC
Created attachment 37742 [details]
java.io.IOException stack example


Investigating a download issue on several copies of our application (the application is deployed on hundreds of tomcats, each application is independent, no clustering, just different client), we found out that we have thousands of errors such as the one attached.

We are using tomcat 9.0.41 and 9.0.43 but I think this issue is much older than that (we only keep 3 months of logs). The JDK version used is either OpenJDK 8 or 11 from AdoptOpenJDK (the JDK version doesn't change anything).

The installation of tomcat is from the official website and we are compiling APR automatically following the documentation on each tomcat release.

The connector configuration is:
        connectionTimeout="20000" >
        <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol"
                compressionMinSize="250" />
The service is behind an haproxy which handle TLS.

Java parameters varies for memory but other than that, this is a typical setenv.sh:
# Ansible managed
export JAVA_HOME="/srv/resources/jvm/jdk-11.0.10+9"
export CATALINA_HOME="/applications/tomcat/engine"

export compilation="-XX:-TieredCompilation"
export logs="-Xlog:gc*:file=${CATALINA_HOME}/logs/gc.log::filecount=10,filesize=20m:time,tid,level,tags"
export timeouts="-Dsun.net.client.defaultConnectTimeout=300000 -Dsun.net.client.defaultReadTimeout=7200000"
export FlightRecording="-XX:StartFlightRecording=disk=true,dumponexit=true,filename=${CATALINA_HOME}/logs"

export JAVA_OPTS="-Djava.awt.headless=true -Dcom.sun.security.enableAIAcaIssuers=true -XX:+UseG1GC -XX:MaxGCPauseMillis=800 -XX:+UseStringDeduplication -Djava.net.preferIPv4Stack=true -Xmx64g -XX:ReservedCodeCacheSize=2g -XX:MetaspaceSize=10g -XX:G1HeapRegionSize=32M ${compilation} ${logs} ${timeouts} ${FlightRecording}"

# JMX monitoring
export CATALINA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/applications/tomcat/settings/jmxremote/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/applications/tomcat/settings/jmxremote/jmxremote.access -Djava.rmi.server.hostname=REDACTED"

# Activate Apache Portable Runtime (APR) based Native library for Tomcat
export LD_LIBRARY_PATH=/usr/local/apr/lib

Changing the connector from "org.apache.coyote.http11.Http11AprProtocol" to "org.apache.coyote.http11.Http11Nio2Protocol" completely eliminate the issue, we see 0 error. Reproducing the issue is complicated and we have no automatic way to trigger this yet.

The issue https://bz.apache.org/bugzilla/show_bug.cgi?id=62626 looks similar and we are indeed using imageio (when we use JDK 8, we are using jai-imageio-core-1.4.0, when we use JDK 11, we use the included imageio).

I'm sorry but I may not be able to help you further since were are switching everthing to nio2 now, but I still wanted to raise the issue so that you know that such problem still exist.
Comment 1 Mark Thomas 2021-02-22 13:06:16 UTC
Error code 104 is "Connection reset by peer". Therefore, this is not a Tomcat issue.

I recommend NIO rather than NIO2 unless performance testing has indicated a clear benefit of using NIO2.

Further assistance, if required, can be obtained from the users mailing list.
Comment 2 slash 2021-02-22 13:28:49 UTC
We also have "Unexpected error [32]".

And how do you explain that the switch from APR does eliminate the issue?

Currently I have 2 tomcats with the exact same configuration except one is using APR and the other one is using NIO2 and I see the errors only on the first one.

Also the tomcat that did have issues last week with APR doesn't have any issue today with NIO2.

Anyway, thank you for pointing us to NIO, we'll check it out.

Of course, you can disregard this report, we'll keep future issues to our self if they don't interest you.

I thought that it would be interesting to know that such problem existed, but it seems you prefer to consider that it's simply a "Connection reset by peer", OK.
Comment 3 slash 2021-02-22 13:35:54 UTC
This is the result on one application:
# for n in {01..22}; do echo -n "log.2021-02-${n}_09.log ";  grep "Unexpected error" log.2021-02-${n}_09.log | wc -l; done
log.2021-02-01_09.log 6144
log.2021-02-02_09.log 5469
log.2021-02-03_09.log 5476
log.2021-02-04_09.log 4653
log.2021-02-05_09.log 5471
log.2021-02-06_09.log 4348
log.2021-02-07_09.log 0 (Sunday, almost 0 activity)
log.2021-02-08_09.log 2997
log.2021-02-09_09.log 4523
log.2021-02-10_09.log 4709
log.2021-02-11_09.log 4291
log.2021-02-12_09.log 5725
log.2021-02-13_09.log 3398
log.2021-02-14_09.log 7 (Sunday, almost 0 activity)
log.2021-02-15_09.log 4216
log.2021-02-16_09.log 4078
log.2021-02-17_09.log 4064
log.2021-02-18_09.log 4112
(switch from APR to NIO2)
log.2021-02-19_09.log 0
log.2021-02-20_09.log 0
log.2021-02-21_09.log 0
log.2021-02-22_09.log 0

Of course, you can say that switching to NIO2 did nothing, but I think that it did.
Comment 4 Mark Thomas 2021-02-22 13:42:11 UTC
Error code 104 is a client disconnect. That is never a Tomcat bug.

Error code 32 is a broken pipe. That error code was never mentioned in the original bug report. The Tomcat Native/APR code does not use pipes as far as I can tell although it is possible I am missing something under the covers in either the APR library or OpenSSL. If you have a test case that reproduces the 32 error on a clean install of the latest Tomcat 9 and Native Connector then please open a new bug report and provide the steps to reproduce the error and any additional information in that new bug report.
Comment 5 Remy Maucherat 2021-02-22 14:03:09 UTC
If we're comparing the connectors as of right now, I would say I prefer NIO2 over NIO on Java 8, since Oracle was not fixing or improving NIO then.

Now with the next LTS coming up, I would say NIO is now better since it has fixes/refactorings and new features [NIO2 does not have the new features]. On clients I would say NIO2 is always better as long as the new features are not needed.

So I would have, if things don't change:
Java 8: NIO2 > NIO
Java 11: tie
Java 17: NIO > NIO 2

This is not making things better overall ...