Bug 66512 - File downloads render as empty if access to Tomcat based application is delegated via AJP
Summary: File downloads render as empty if access to Tomcat based application is deleg...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Util (show other bugs)
Version: 9.0.73
Hardware: PC All
: P2 major (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-06 14:53 UTC by Alexander Schüßler
Modified: 2023-03-07 11:27 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Schüßler 2023-03-06 14:53:10 UTC
Hi there,

I would like to place a report about an issue we are struggling with since we installed latest security patches for Apache Tomcat and Apache HTTPD on our hosting servers.

We - Plunet GmbH - develop a Java based application and our inbuilt file manager is using a Java Servlet for the file exchange. Our application only supports Java 8. There are plans to support 11 or 17 but we will need time for this unfortunately so we can not just update Java in case this might help.

We see that since the latest updates of Tomcat users complaint that files with Non-Unicode characters are downloaded as empty files. Such characters are for instance greek, kyrillic or hebrew characters.

We see the following error in the stderror.log of Tomcat
01-Mar-2023 17:59:39.100 SCHWERWIEGEND [ajp-nio-127.0.0.1-9129-exec-3] org.apache.coyote.ajp.AjpProcessor.service Error processing request
java.lang.IllegalArgumentException: Das Unicode Zeichen [Π] an Code Punkt [928] kann nicht kodiert werden, da es außerhalb des erlaubten Bereiches von 0 bis 255 ist.
at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:292)

We are aware that this issue happens since Tomcat 9.0.71 and also 9.0.73 did not resolve it as we hoped to I am afraid so it is definitely related with one of the latest changes in the MessageBytes implementation of the Tomcat Util library as I could see them in the code history of this file.
This issue happens though ONLY if - as it is done on our SaaS servers - the access to our application is delegated via AJP connectors. This is done because all instances on a shared server access it via virtual hosting on ports 80 and 443.

Currently we have two possible fixing options which both work as we can confirm:
1. We replace the tomcat-util.jar in Tomcat's lib directory with one from an older Tomcat 9.x release. (this is not very practical though)
2. We change the delegation via mod_ajp to delegation via mod_proxy (this is huge effort to change it for hundreds of instances)

We therefore wonder if this is considered to be a bug of maybe just a planned change and we have to maybe change something in the config of server.xml or so. But for me it rather feels like a bug, especially since it only happens in this special scenario. I did some research in the web yet but I did not find anyone who reported the same issue yet so I am putting it into this bugtracker.

Let me know in case you need more information for addressing my issue.

Thank you very much!

Cheers!

Alex
Comment 1 Mark Thomas 2023-03-06 15:09:30 UTC
That error message would have included a stack trace. Can you provided it please.
Comment 2 Alexander Schüßler 2023-03-06 15:15:14 UTC
Hi Mark,

thanks for getting back to me so fast! Appreciate it.
Please see below a complete sample of a stacktrace:

#----------------------------------------------------------

28-Feb-2023 22:46:47.393 SEVERE [ajp-nio-127.0.0.1-9004-exec-35] org.apache.coyote.ajp.AjpProcessor.service Error processing request
	java.lang.IllegalArgumentException: The Unicode character [Κ] at code point [922] cannot be encoded as it is outside the permitted range of 0 to 255
		at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:292)
		at org.apache.tomcat.util.buf.MessageBytes.toBytes(MessageBytes.java:261)
		at org.apache.coyote.ajp.AjpMessage.appendBytes(AjpMessage.java:172)
		at org.apache.coyote.ajp.AjpProcessor.prepareResponse(AjpProcessor.java:1011)
		at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:374)
		at org.apache.coyote.Response.action(Response.java:209)
		at org.apache.coyote.Response.sendHeaders(Response.java:434)
		at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:292)
		at org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:252)
		at org.apache.catalina.connector.Response.finishResponse(Response.java:445)
		at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:391)
		at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:433)
		at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
		at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:926)
		at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1791)
		at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
		at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
		at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
		at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
		at java.lang.Thread.run(Thread.java:750)


#----------------------------------------------------------

Let me know if you need something else.

Cheers!

Alex
Comment 3 Alexander Schüßler 2023-03-06 15:32:19 UTC
Hi,

to clarify:
It is about the non-unicode characters in the FILE NAMES - I guess this was not clear in my initial report!

For instance one of our clients from Greece struggled with a file named
5_ΠΡΟΣ EL.pdf

Cheers!

Alex
Comment 4 Mark Thomas 2023-03-06 15:33:35 UTC
Thanks. It looks like a UTF-8 value is being put into an HTTP header. Prior to the refactoring of MessageBytes, that would have resulted in a corrupted header. Now it triggers an error to make the problem more obvious. That behavioural change was intentional.

The reason you see the issue with AJP but not HTTP is that HTTP is coded more defensively. HTTP logs the problematic header with a warning and carries on. AJP fails the request.

HTTP headers are expected to be US-ASCII. Non US-ASCII values should be encoded as per RFC 8187. To what extent that is an application responsibility vs a container responsibility is not made clear in the Servlet spec.

I can look at making the AJP code more robust since it needs to be consistent with the HTTP code but that doesn't feel like a proper fix. It should also trigger quite a few warning messages in the logs which you would probably prefer not to see.

Given that the application works either when the header is corrupted (using an old tomcat-util.jar) or when the header is missing (current HTTP code) that suggests whatever header is being set isn't necessary. Is not setting whatever header is causing the problem an option? (Assuming Tomcat is not setting it).
Comment 5 Alexander Schüßler 2023-03-06 16:00:20 UTC
Hi Mark,

I am open for any solution for our issue.

To underline, we are using the given config for AJP + Apache Tomcat for years without changes and never saw issues like this. The only thing that changes
If desired I can send sample context.xml, server.xml and web.xml of our Tomcats and also the virtual host config for Apache HTTPD so you can get an idea of our setup.

Let me know whatever you will need for helping me (and maybe also others) with this matter and I will see what I can do.

Cheers!

Alex
Comment 6 Mark Thomas 2023-03-06 16:06:12 UTC
That there have been corrupted header values for years without issue supports the view that the header is probably not required.

What would be useful for confirmation is the HTTP response headers. For a request that fails with AJP, repeat the same request in a configuration that uses an older version to tomcat-util.jar and provide the response headers. I want to see if I can figure out what is setting the header.
Comment 7 Alexander Schüßler 2023-03-06 16:33:23 UTC
Hey Mark,

I am not 100% sure what you want me to do but I guess you would like me to track a click on a file(i.e. a download link)in the file manager on a system where it works and not works each and show output in browser debug tools right?

There are huge differences.
In a working system we have this:

#----------------------------------------------------------
HTTP/2 200 OK
strict-transport-security: max-age=0
x-frame-options: DENY
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
content-security-policy: default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline'; connect-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com https://fonts.gstatic.com;font-src 'self' 'unsafe-inline' https://fonts.googleapis.com https://fonts.gstatic.com;frame-ancestors 'self'; form-action 'self'
referrer-policy: strict-origin-when-cross-origin, no-referrer-when-downgrade
feature-policy: autoplay 'none'; camera 'none'; encrypted-media 'none'; fullscreen 'none'; geolocation 'none'; microphone 'none'; midi 'none'; payment 'none';
cache-control: private, must-revalidate
pragma: private, must-revalidate
content-disposition: inline; filename="5_???? EL.pdf"; filename*=UTF-8''5_%CE%A0%CE%A1%CE%9F%CE%A3%20EL.pdf
content-length: 53923
vary: User-Agent
content-type: application/pdf
date: Mon, 06 Mar 2023 16:25:51 GMT
server: Apache
X-Firefox-Spdy: h2
#----------------------------------------------------------


In a non working system it seems nothing goes really through:
#----------------------------------------------------------
HTTP/2 200 OK
content-length: 0
content-type: application/pdf
date: Mon, 06 Mar 2023 16:25:00 GMT
server: None
X-Firefox-Spdy: h2
#----------------------------------------------------------

I did both tests with Mozilla Firefox and extracted the information from the networking tab by clicking on the triggered GET request. Is it that what you need? Else please guide me a little bit more.

It is also not an issue for us to give you access to a test system on our side (at least from the web perspective). If it needs to be something that grants you server-side access via remote desktop or so I had to check this with my team if we can provide such things.

Thanks!

Cheers!

Alex
Comment 8 Alexander Schüßler 2023-03-06 16:37:15 UTC
P.S:

I am also on Skype so also screen sharing on Skype are absolutely no problem if you are willing to, but I definitely do not expect that from you, it is just a kind offer I would like to make in case you might think that this will be beneficial.

(I just read your profile on Apache Home so I thought I should mention it :) )
Comment 9 Christopher Schultz 2023-03-06 17:32:57 UTC
Alexander, have a look at this StackOverflow question and answer:

https://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http

I suspect your "download servlet" is manually-setting the Content-Disposition header in order to transmit the filename of the file back to the client. You are also setting the "filename" header as well (as seen in your HTTP response headers).

Try setting the contents of the "filename" attribute of the Content-Disposition header to an encoded value in the same way that the "filename" header is being encoded.

The sample code in the highest-voted answer uses C# but you should be able to adapt it to Java code easily.
Comment 10 Alexander Schüßler 2023-03-06 17:54:38 UTC
Hi Mark,

I am not a developer at my company but I found a suspicious piece of code in our codebase that would underline your assumption:

https://pastebin.com/m74V3qHx


I suppose what you want us to do is to replace this (please confirm):

#----------------------------------------------------------
                response.setHeader ( "Content-Disposition",
                        forceDownload+"; filename=\"" + originalFileName + "\"; filename*=UTF-8''" +
                                SysFacade.encodeFilename_for_HttpUrlAccess ( originalFileName ) );

#----------------------------------------------------------

with this:

#----------------------------------------------------------
String encodedFilename = SysFacade.encodeFilename_for_HttpUrlAccess(originalFileName);
                response.setHeader ( "Content-Disposition",
                        forceDownload+"; filename=\"" + encodedFilename + "\"; filename*=UTF-8''" +
                                encodedFilename);
#----------------------------------------------------------

I have access to our codebase and could build a test build with this change, that is not an issue but I must say that a code change would not be the best solution for us because it will mean that we will have to deploy updates for all affected customers.

Do you think this is the only option we have?

Let me know your thoughts!

Cheers!

Alex
Comment 11 Mark Thomas 2023-03-06 18:06:32 UTC
Alex,

You might want to test an updated header as if you look at the "working" one:

content-disposition: inline; filename="5_???? EL.pdf"; filename*=UTF-8''5_%CE%A0%CE%A1%CE%9F%CE%A3%20EL.pdf

the filename is corrupted (the ???? characters).

Fixing the header is (in my view) the best/correct long term option.

The next Tomcat release round (April) will contain a fix that aligns AJP behaviour with HTTP behaviour (at least I am intending to include such a fix and will link it to this bug when I do). Given you said that mod_proxy_http worked, that should address this issue. I would caution you to test that before relying on it as my expectation is that it will cause the content-disposition header to be dropped.
Comment 12 Alexander Schüßler 2023-03-06 18:17:23 UTC
Hi Mark,

thanks so much for all your help with this matter.
I am impressed that you are more responsive than any paid support agent :)

I will test the code change on our side and see if this also helps.
I am also happy to test your change in the Tomcat source code.

We would need Windows binaries therefore as we download them from the download site of Tomcat.
Since I can not yet estimate how urgent we need the fix in our header implementation I would love to test your fix asap too. Any chance to get a Windows binary earlier than April? Before I wrote this ticket (i.e. before 9.0.73 was released) I also had a look for pre-built nightly or alpha builds for testing purposes but it seems there are none available in the download archive. But I still dare to ask :)

Thanks and have a nice day!
My day ends now (actually ended hours ago :D ). I will reply to you from tomorrow again.

Enjoy your evening!

Cheers!

Alex
Comment 13 Christopher Schultz 2023-03-06 20:11:50 UTC
(In reply to Alexander Schüßler from comment #10)
> I am not a developer at my company but I found a suspicious piece of code in
> our codebase that would underline your assumption:

That does indeed look "suspicious". I would bet that the original developer was trying to accommodate both clients who do and do not understand RFC 5987-encoded strings simultaneously. I wonder how that would actually behave with each kind of client.

I would indeed change the code, but I would only use a single filename:

    response.setHeader ( "Content-Disposition",
            forceDownload
            + "; filename*=UTF-8''"
            + SysFacade.encodeFilename_for_HttpUrlAccess ( originalFileName ) );

Or, generate a "clean" version of your original filename that is known not to contain any non-ASCII characters. I'll leave that as an exercise for the reader ;)
Comment 14 Mark Thomas 2023-03-06 22:01:09 UTC
Dev build for evaluation available from:
https://people.apache.org/~markt/dev/v9.0.74-dev/

Usual caveats apply:
- this is not an official release
- use it at your own risk
Comment 15 Alexander Schüßler 2023-03-07 09:39:32 UTC
Dear Mark and Christopher,

in a nutshell:
Both attempts will resolve the issue indeed. The Tomcat Dev build resolves the issue but also if I compile our application to a build with a fixed encoding.

We are thinking about working with https://cxf.apache.org/javadoc/latest-3.1.x/org/apache/cxf/attachment/Rfc5987Util.html in the future.

FYI already 2016 the developer said that we should be RFC compliant but we are still using an own implementation. The original reason why it has been implemented like this was, that apparently on download of files with some special characters the file name was changed after download.

E.g., "35173 J&J SOPs ED.log" was then downloaded as "35173%20j%26j%20sops%20ed.log".

It seems though that currently even with the proposal suggested by Christopher the latter issue does not happen anymore, so I guess we can change it here.

Thanks so much again for all your help. This was very efficient. Appreciate it.

Cheers!

Alex
Comment 16 Mark Thomas 2023-03-07 11:27:28 UTC
Fixed in:
- 11.0.x for 11.0.0-M5 onwards
- 10.1.x for 10.1.8 onwards
-  9.0.x for  9.0.74 onwards
-  8.5.x for  8.5.88 onwards