Created attachment 38325 [details] script to compare filesystem to HTTP-transfer (configure $SRC and wget-URL) The DefaultServlet does modify files on transit by removing BOM and mishandling the resulting size for UTF-8 and UTF-16 BOM resulting in a transfer-timeout. UTF-32 is left intact. When downloaded with wget the result file will have the last bytes appended depending on the BOM-size due to retrying. E.g. UTF-8 3-byte-BOM makes content "TEST" -> "TESTEST". Looks to me that the tomcat code at https://github.com/apache/tomcat/blob/6a667943c5da6b5d61ac6bec1d7c9de061e3217c/java/org/apache/catalina/servlets/DefaultServlet.java#L1051 does not detect conversionRequired for the removal of BOM, so at https://github.com/apache/tomcat/blob/6a667943c5da6b5d61ac6bec1d7c9de061e3217c/java/org/apache/catalina/servlets/DefaultServlet.java#L1079 the 'Content-Length' is written before the BOM is stripped, resulting in the clients waiting for more bytes to come that never arrive. Additionally why does UTF-32 work? The code lacks the 'skip' like all the other encodings: UTF-8 skips and returns: https://github.com/apache/tomcat/blob/6a667943c5da6b5d61ac6bec1d7c9de061e3217c/java/org/apache/catalina/servlets/DefaultServlet.java#L1275 UTF-32 does not skip, just resturns encoding name: https://github.com/apache/tomcat/blob/6a667943c5da6b5d61ac6bec1d7c9de061e3217c/java/org/apache/catalina/servlets/DefaultServlet.java#L1287 See attached test-script for Micro Focus ZENworks which uses Tomcat and got this bug report as #02286060 "ZCM Webserver 2020.01 is not transparent to BOM and mishandling modified filesize" on -05-05 but refused to report upstream on -06-21 due to: > Our engineering team come back with the analyses. Looking from ZENworks > perspective there is no functionality impact. It seems Tomcat is used for your > own for purpose, where the issue is happening. For that reason the suggestion > is that you should report this case/scenario to the tomcat team. In case it > will fixed from the Tomcat side, with every major ZENworks update a new version > of Tomcat will be consumed.
Example output of test-encoding.sh showing file transfer with retry resulting in modified file contents: Contents UTF-8: ...TEST 1. try : TEST (timeout waiting for 3 bytes) 2. try : EST (Content-Range: bytes 4-6/7) Result : TESTEST [...] 1c1 < 0000000: efbb bf54 4553 54 ...TEST --- > 0000000: 5445 5354 4553 54 TESTEST d42db618f4b78cea995329eb8d60b491 /opt/novell/zenworks/install/downloads/TEST/UTF-8.txt 2961d3c31fbd6d0abc36fa53d3565915 /tmp/UTF-8.txt 1c1 < 0000000: feff 0054 0045 0053 0054 ...T.E.S.T --- > 0000000: 0054 0045 0053 0054 0054 .T.E.S.T.T dcb86ac7739a5776eadcc5e5dedf94fa /opt/novell/zenworks/install/downloads/TEST/UTF-16BE.txt 09a69b9d518abf314fb830236d27bdce /tmp/UTF-16BE.txt 1c1 < 0000000: fffe 5400 4500 5300 5400 ..T.E.S.T. --- > 0000000: 5400 4500 5300 5400 5400 T.E.S.T.T. 64343f295737c917fc57e52431c6f6de /opt/novell/zenworks/install/downloads/TEST/UTF-16LE.txt 55a65357c74490ce68b42bcca6962951 /tmp/UTF-16LE.txt Files /opt/novell/zenworks/install/downloads/TEST/UTF-32BE.txt and /tmp/UTF-32BE.txt are identical Files /opt/novell/zenworks/install/downloads/TEST/UTF-32LE.txt and /tmp/UTF-32LE.txt are identical
The provided test case passes. The analysis has a couple of flaws. 1. UTF-32 does skip the BOM Process BOM reads up to 4 bytes from the InputStream For BOM less than 4 bytes long, the method has to handle skipping the correct number of bytes for the given BOM. This is what the skip method does. UTF-32 has a 4 byte BOM. Therefore if a UTF-32 BOM is detected, the BOM has already been fully read (i.e. skipped) and no correction for a shorter BOM is required. 2. The DefaultServlet never sets the Content-Length and removes the BOM The BOM is only removed if: - the content is included; or - conversion is required If conversion is required, the Content-Length is not explicitly set. The Content-Length may be explicitly set for an included resource but setContentLengthLong is a NO-OP for included resoucres. If you can recreate this issue on a clean install of the latest release of a currently supported Tomcat version (10.1.0-M16, 10.0.22, 9.0.64 or 8.5.81 at the time of writing) then feel free to re-open this issue and provide the steps to recreate.