|Summary:||IllegalArgumentException at java.nio.Buffer.position at SocketWrapperBase.transfer()|
|Product:||Tomcat 8||Reporter:||Evgenij Ryazanov <katzyn>|
|Component:||Connectors||Assignee:||Tomcat Developers Mailing List <dev>|
A sample of stack traces
Compressed journalctl output
Unexpected IllegalStateException in HttpServletResponse.sendError()
Description Evgenij Ryazanov 2016-11-24 01:00:32 UTC
An unexpected exception found in log. SEVERE: Error finishing response java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at sun.nio.ch.IOUtil.write(IOUtil.java:68) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:134) at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:101) at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:157) at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite(NioEndpoint.java:125 at org.apache.tomcat.util.net.SocketWrapperBase.doWrite(SocketWrapperBase.java:670) at org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking(SocketWrapperBase.java:60 at org.apache.tomcat.util.net.SocketWrapperBase.flush(SocketWrapperBase.java:597) at org.apache.coyote.http11.Http11OutputBuffer.flushBuffer(Http11OutputBuffer.java:519) at org.apache.coyote.http11.Http11OutputBuffer.finishResponse(Http11OutputBuffer.java:318) at org.apache.coyote.http11.Http11Processor.endRequest(Http11Processor.java:1458) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:823) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:7 at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1455) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) I'm not sure but cause may be the same as in recently fixed bug 60372. SocketWrapperBase also does not guard Buffer.limit() usages. Looks like all usages of this method on reusable buffers should be checked.
Comment 1 Violeta Georgieva 2016-11-24 07:51:29 UTC
Hi, Do you have a reproducible scenario? It will help. Also is it reproducible every time or just under load? Thanks, Violeta
Comment 2 Evgenij Ryazanov 2016-11-24 09:19:22 UTC
Created attachment 34473 [details] A sample of stack traces
Comment 3 Evgenij Ryazanov 2016-11-24 09:23:25 UTC
1-3 errors per day under load. At least it has a stack trace. It's not easy to simulate I/O error at right time to reproduce it. There are too many assorted messages in logs. After some filtering and search I see that this exception usualy happends just a two or three seconds after org.apache.coyote.http11.Http11Processor service INFO: Error parsing HTTP request header Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level. java.lang.IllegalStateException: Unexpected state: headers already parsed. Buffer not recycled? at org.apache.coyote.http11.Http11InputBuffer.parseHeaders(Http11InputBuffer.java:550) org.apache.coyote.http11.Http11Processor endRequest SEVERE: Error finishing response java.lang.NullPointerException at org.apache.coyote.http11.Http11OutputBuffer.commit(Http11OutputBuffer.java:351) See attachment for full traces. So all of them may be related. There are many such errors in service and endRequest, but only a few at Buffer.position(). Sometimes it has a longer stack trace (see sample in the same attachment).
Comment 4 Evgenij Ryazanov 2016-11-26 08:56:51 UTC
Created attachment 34478 [details] Compressed journalctl output I've uploaded a nearly full log (without sensitive data and unrelevant messages from web applications). There are also a lot of org.apache.tomcat.util.net.AbstractEndpoint countDownConnection WARNING: Incorrect connection count, multiple socket.close called on the same socket. after some uptime.
Comment 5 Violeta Georgieva 2016-11-27 17:48:54 UTC
Hi, I succeeded to reproduce the exceptions from this issue . 1. Request a file with such a length so that "send file" is used. - just before serving the file , delete it, thus it will cause the processor to be released and recycled. As a result this processor will be added to the "recycled processors" . - after this while finishing the response the processor will be released and recycled once again . This will add the processor again to the "recycled processors". As a result one and the same processor will be in the "recycled processors". 2. Now request twice. One and the same processor will be provided for the two different requests . I do not know whether you scenario is similar. If you can describe your scenario it will be helpful. So currently we are using org.apache.tomcat.util.collections.SynchronizedStack<T> which does not guarantee uniqueness of the elements. Regards, Violeta  https://github.com/apache/tomcat/commit/74a12f550478b34261b39d2e324b3951e2ef80cc  https://github.com/apache/tomcat/blob/trunk/java/org/apache/tomcat/util/net/NioEndpoint.java#L891  https://github.com/apache/tomcat/blob/trunk/java/org/apache/coyote/AbstractProtocol.java#L927  https://github.com/apache/tomcat/blob/trunk/java/org/apache/coyote/AbstractProtocol.java#L854  https://github.com/apache/tomcat/blob/trunk/java/org/apache/coyote/AbstractProtocol.java#L728
Comment 6 Mark Thomas 2016-11-27 18:38:27 UTC
To date, the design decision has been that the request processing thread is responsible for ensuring that a processor is only recycled once per request. One reason for not making it a cache responsibility is that the cache only tracks unused processors so preventing duplicates isn't guaranteed to fix the problem.
Comment 7 Evgenij Ryazanov 2016-11-28 00:32:47 UTC
Created attachment 34484 [details] Unexpected IllegalStateException in HttpServletResponse.sendError() (In reply to Violeta Georgieva from comment #5) > - just before serving the file , delete it, thus it will cause the > processor to be released and recycled. I don't think that this is my case. Static content is not touched at all in this period. May be more unknown causes exist. I'm still not able to find a sequence to reproduce it. I've uploaded a one more strange exception. Last change in this directory was six month ago. All these requests are definitely should return 404. But one request throws a strange exception and I'm unable too see how it possible in DefaultServlet. Two more exceptions follow it. This also looks like response was incorretly reused.
Comment 8 Evgenij Ryazanov 2016-11-28 07:45:27 UTC
I've appended useSendfile="false" to all HTTP connectors for testing purposes.
Comment 9 Violeta Georgieva 2016-11-28 13:35:24 UTC
Created attachment 34485 [details] Patch proposal Hi Mark, (In reply to Mark Thomas from comment #6) > To date, the design decision has been that the request processing thread is > responsible for ensuring that a processor is only recycled once per request. > > One reason for not making it a cache responsibility is that the cache only > tracks unused processors so preventing duplicates isn't guaranteed to fix > the problem. What do you think about this approach? The patch is made against Tomcat 9 trunk. Thanks, Violeta
Comment 10 Mark Thomas 2016-11-28 14:35:50 UTC
Created attachment 34486 [details] Alternative patch My initial impression looking at that patch was that it was fixing the symptom rather than the cause. The case it handles should never happen. I have attached an alternative patch that I believe addresses the root cause.
Comment 11 Violeta Georgieva 2016-11-28 14:40:56 UTC
(In reply to Mark Thomas from comment #10) > Created attachment 34486 [details] > Alternative patch > > My initial impression looking at that patch was that it was fixing the > symptom rather than the cause. The case it handles should never happen. > > I have attached an alternative patch that I believe addresses the root cause. +1
Comment 12 Remy Maucherat 2016-11-28 14:54:36 UTC
(In reply to Mark Thomas from comment #10) > Created attachment 34486 [details] > Alternative patch > > My initial impression looking at that patch was that it was fixing the > symptom rather than the cause. The case it handles should never happen. > > I have attached an alternative patch that I believe addresses the root cause. I failed to reproduce the exception with the test case, for some reason, but the patch looks good.
Comment 13 Violeta Georgieva 2016-11-28 15:03:32 UTC
Hi Evgenij, (In reply to Evgenij Ryazanov from comment #8) > I've appended useSendfile="false" to all HTTP connectors for testing > purposes. Do you have already some results with "send file" disabled? Thanks a lot, Violeta
Comment 14 Evgenij Ryazanov 2016-11-29 01:29:19 UTC
Hi. I don't see such strange exceptions in journal any more. 16 hours may be not enough to be fully sure, however. But in previous runs (with sendfile) first NPE or IAE appears in a half an hour or so.
Comment 15 Violeta Georgieva 2016-11-29 09:24:33 UTC
Hi, I committed the Mark's patch. The fix will be available in: - 9.0.x for 9.0.0.M14 onwards and - 8.5.x for 8.5.9 onwards I do not see these exceptions in 8.0.x and 7.0.x. Regard, Violeta
Comment 16 Erich 2016-12-15 19:57:02 UTC
Try enabling NIO in 8.0.X and you should be able to recreate same issue there. We did using 8.0.30. It seems to be NIO specific--8.5.4 was where we discovered it because NIO was enforced for the first time, but you can see it in older versions too.
Comment 17 Mark Thomas 2016-12-16 07:15:35 UTC
I've received a report at $work of a customer seeing the same / a similar issue with 8.0.38 + NIO + send file. While our test case doesn't trigger the issue, it does appear that there is still something that needs to be fixed in 8.0.x (and possibly earlier versions). Re-opening to investigate further.
Comment 18 Mark Thomas 2017-01-05 15:09:35 UTC
I have confirmed that the issue observed in 8.0.x is essentially the same as that seen in 8.5.x. Rather than introduce the issue, the refactoring appears simply to have made it easier to observe. It has been fixed in: - 8.0.x for 8.0.40 onwards - 7.0.x for 7.0.74 onwards - 6.0.x for 6.0.49 onwards