Bug 66488 - MessageBytes#toBytesSimple overwrites request byte buffer
Summary: MessageBytes#toBytesSimple overwrites request byte buffer
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.71
Hardware: Macintosh All
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-24 15:47 UTC by Fabian Günter
Modified: 2023-02-27 12:16 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Günter 2023-02-24 15:47:44 UTC
In 9.0.71 the code for MessageBytes#toBytes was changed to call the newly introduced MessageBytes#toSimpleBytes method which incorrectly assumes that byteC.getBuffer's retured array is a copy of the original request string unique to that specific MessageByte/ByteChunk, which is not the case. Every instance of MessageByte created early in the request processing gets passed a reference to the same byte array (which is documented in java.nio.ByteBuffer#array). 
An easy fix for this is changing

byte[] bytes = byteC.getBuffer(); 

in toBytesSimple to

byte[] bytes = new byte[len];

as well as changing

byteC.setEnd(len);

in toBytesSimple to

byteC.setBytes(bytes, 0, len).


I'm not sure though if it is intended that every MessageByte's ByteChunk share the same buffer.

The result of the bug is that a Servlet's request.getQueryString() gets overwritten by part of the Authorization header due to the call of authorization.toBytes() in BasicAuthenticator#doAuthenticate.
I can provide a simple example if necessary although due to some concurrency issues or whatever it only happens if I add a breakpoint before authorization.toBytes in BasicAuthenticator.
We stumbled upon the problem because it always occurs in our production application which I can't share for obvious reasons.
Comment 1 Mark Thomas 2023-02-24 16:26:48 UTC
Thanks for the report. I'll take a look.

The intention is that the original byte[] is shared to reduced memory footprint and unnecessary copying. Given that the original data is in bytes, I want to look into why toBytes() is triggering a conversion. It suggests some byte to TBD conversion has happened beforehand.

I agree that your proposed fix is easy. That said, I want to see if I can find a robust solution that avoids the creation of the new array. I'll note that the non-simple conversion already uses the approach you suggest.
Comment 2 Mark Thomas 2023-02-25 17:54:56 UTC
Note the breakpoint / debug reproduction is likely because the debugger is calling toString() on the MessageBytes instance oin order to display it.
Comment 3 Konstantin Kolinko 2023-02-25 19:52:55 UTC
(In reply to Fabian Günter from comment #0)
> In 9.0.71 the code for MessageBytes#toBytes was changed to call the newly
> introduced MessageBytes#toSimpleBytes method which incorrectly assumes that
> byteC.getBuffer's retured array is a copy of the original request string

Honestly, I do not follow your explanation.


A byteC.getBuffer() call returns ByteChunk.buff. That is a byte[] array that is owned by that ByteChunk. The buff field in ByteChunk is initialized as null (by ByteChunk() constructor) or as a "new byte[initial]" (by other constructor calling allocate(..)).

The only way that it may be changed to an externally created array is via a ByteChunk.setBytes(...) call. What call to ByteChunk.setBytes(...) passes such a shared array?

> Every instance of MessageByte created early in the request processing
> gets passed a reference to the same byte array
> (which is documented in java.nio.ByteBuffer#array)

I do not see such documentation. The javadoc of that method says "Returns the byte array that backs this buffer".

Do we have several ByteBuffer instances that are backed by the same array? Do we have the same ByteBuffer instance reused somewhere?


To be safe, I reviewed documentation for java.nio.charset.Charset.encode(...) method that is called by MessageBytes.toBytes(). It falls through to java.nio.charset.CharsetEncoder.encode​(CharBuffer) which is documented as "Returns: A newly-allocated byte buffer". I know that CharsetEncoder class is not thread-safe, but no ByteBuffer sharing occurs here.
Comment 4 Konstantin Kolinko 2023-02-25 20:35:43 UTC
In case of odd "data mixup" errors between requests (such as one observed here),
it is recommended to set discardFacades="true" on a <Connector>.

https://tomcat.apache.org/tomcat-9.0-doc/config/http.html

In Tomcat 9.0.72 the discardFacades attribute defaults to "false".
In Tomcat 10.1.x and later it defaults to "true".
Comment 5 Fabian Günter 2023-02-25 20:50:50 UTC
(In reply to Konstantin Kolinko from comment #3)
> (In reply to Fabian Günter from comment #0)
> > In 9.0.71 the code for MessageBytes#toBytes was changed to call the newly
> > introduced MessageBytes#toSimpleBytes method which incorrectly assumes that
> > byteC.getBuffer's retured array is a copy of the original request string
> 
> Honestly, I do not follow your explanation.
> 
> 
> A byteC.getBuffer() call returns ByteChunk.buff. That is a byte[] array that
> is owned by that ByteChunk. The buff field in ByteChunk is initialized as
> null (by ByteChunk() constructor) or as a "new byte[initial]" (by other
> constructor calling allocate(..)).
> 
> The only way that it may be changed to an externally created array is via a
> ByteChunk.setBytes(...) call. What call to ByteChunk.setBytes(...) passes
> such a shared array?

Well there are several places in Tomcat's code calling exactly what you said here, ByteChunk.setBytes, especially in Http11InputBuffer which is the class I'm referring to. The class has one single byteBuffer that gets shared to request, method, authentication, etc. via the .array method, as Mark confirmed.

> 
> > Every instance of MessageByte created early in the request processing
> > gets passed a reference to the same byte array
> > (which is documented in java.nio.ByteBuffer#array)
> 
> I do not see such documentation. The javadoc of that method says "Returns
> the byte array that backs this buffer".
> 
> Do we have several ByteBuffer instances that are backed by the same array?
> Do we have the same ByteBuffer instance reused somewhere?
> 

You should read more than the first line of the JavaDoc:
...
Modifications to this buffer's content will cause the returned array's content to be modified, and vice versa.
...

(see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/ByteBuffer.html#array() ). The byte buffer of the request is shared by all ByteChunks which is quite obvious by debugging the code.

> 
> To be safe, I reviewed documentation for
> java.nio.charset.Charset.encode(...) method that is called by
> MessageBytes.toBytes(). It falls through to
> java.nio.charset.CharsetEncoder.encode​(CharBuffer) which is documented as
> "Returns: A newly-allocated byte buffer". I know that CharsetEncoder class
> is not thread-safe, but no ByteBuffer sharing occurs here.

> In case of odd "data mixup" errors between requests (such as one observed > here),
> it is recommended to set discardFacades="true" on a <Connector>.

> https://tomcat.apache.org/tomcat-9.0-doc/config/http.html

> In Tomcat 9.0.72 the discardFacades attribute defaults to "false".
> In Tomcat 10.1.x and later it defaults to "true".

As stated in my bug report, I'm reproducing this locally with a single request. No 'data mixup' here.
Comment 6 Fabian Günter 2023-02-25 21:08:03 UTC
I forgot to mention again, the problem is toBytesSimple, not toBytes.
Comment 7 Mark Thomas 2023-02-27 12:16:00 UTC
Fixed in:
- 10.1.x for 10.1.7 onwards
-  9.0.x for  9.0.73 onwards
-  8.5.x for  8.5.87 onwards

Note 11.0.x was not affected.