Bug 59220 - AsyncListener#onComplete not called after timeout if buffer is flushed
Summary: AsyncListener#onComplete not called after timeout if buffer is flushed
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.0.32
Hardware: PC All
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-23 13:17 UTC by Scott Nicklous
Modified: 2016-04-19 18:53 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Scott Nicklous 2016-03-23 13:17:59 UTC
If an async dispatch cycle is started and a timeout occurs before the response
buffer is flushed using response#flushBuffer(), then the AsyncListener#onTimeout()
method followed by the AsyncListener#onComplete() are correctly called.

However, if the buffer is flushed before the timeout occurs, the 
AsyncListener#onTimeout() method will be called, but the 
AsyncListener#onComplete() will not be.

A sample servlet demonstrating the problem, AsyncDebugListener, is provided 
at the following location:

https://github.com/msnicklous/AsyncDebug

Build the AsyncDebugListener module using 'mvn install' and deploy it on Tomcat.

The service method and each of the AsyncListener methods logs output to the
AsyncListener.log file. You can follow execution by looking at the logs.

Assuming Tomcat is installed locally, you can use the following URLs:

1) No timeout - works correctly:

http://localhost:8080/AsyncDebugListener/ltest?reps=3

2) Timeout, no flush - works correctly:

http://localhost:8080/AsyncDebugListener/ltest?reps=3&timeout

3) Flush before timeout - error case:

http://localhost:8080/AsyncDebugListener/ltest?reps=3&timeout&flush

From the logs, you will see that AsyncListener#onTimeout() was called, but  
AsyncListener#onComplete() was not.

Thanks for having a look at this!
Comment 1 Violeta Georgieva 2016-03-28 10:30:36 UTC
Hi,

Based on the example that you've provided I can say that you are in so called "error dispatch" situation i.e. 


"2.3.3.3 Asynchronous processing
- If none of the listeners called AsyncContext.complete or any of the
AsyncContext.dispatch methods, then perform an error dispatch with a
status code equal to HttpServletResponse.SC_INTERNAL_SERVER_ERROR
and make the Throwable available as the value of the
RequestDispatcher.ERROR_EXCEPTION request attribute.
- If no matching error page is found, or the error page does not call
AsyncContext.complete() or any of the AsyncContext.dispatch methods,
then the container MUST call AsyncContext.complete."


So when you do not perform flush the response is not committed and org.apache.catalina.valves.ErrorReportValve.invoke(...) handles the situation correctly and onComplete is called, but when you perform flush, ErrorReportValve will mark the connection to be closed as the response is committed and onComplete will not be invoked.

I'm thinking about whether we should delay this, but I would like to hear the other committers' opinion.


For your use case my recommendation is to call AsyncContext.complete in your listener and not to wait the container to call it for you.

Regards,
Violeta
Comment 2 Remy Maucherat 2016-03-28 13:37:23 UTC
That sounds like a logical explanation and 59219 sounds a bit similar. This is all edge cases though and I doubt everything is well defined in the specification. So it is best to not try it :)
Comment 3 Scott Nicklous 2016-03-29 16:00:38 UTC
Hi Violeta and Remy,

thank you very much for having a look at this so quickly (and thank you, Remy for fixing 59213 so promptly!).  The example servlets I provided were for the purpose of reproducing the problem rather than for illustrating the use case. I would like to explain my use case more clearly.

I'm working on the Apache Pluto project, which is the reference implementation for JSR 286 Portlet Specification 2.0, and which will be the reference implementation for JSR 362 Portlet Specification 3.0. Pluto is a minimal portal server that implements the required API for portlet applications. 

For version 3.0, we want to provide async support for portlet applications. The portlet specification can make recommendations about how portlet applications should do error handling, but can't really guarantee that the portlet apps will follow the recommendations.

The Pluto server has to allocate resources to support the portlet applications during async processing and needs a reliable way to release those resources even if a timeout or error condition occurs. It seems to me that the natural way to do that would be for the Pluto server to register an AsyncListener on behalf of the portlet application in order to release the resources when onComplete() is called.

However, this bug along with 59219 means that in these edge cases, Pluto will not be able to release the resources, which could potentially result in a memory leak. 

That said, I understand that these really are edge cases, and I can't really say how high the priority should be in the grand scheme of things. But it would be very nice if they could be fixed at some point as time & priority allows. :-)

thanks again,
Scott
Comment 4 Remy Maucherat 2016-03-29 17:28:29 UTC
The question is really about the cases where complete should be called for the application (which didn't call it although it should have).
Comment 5 Scott Nicklous 2016-03-29 20:01:46 UTC
I know what you mean and agree with you. 

From Tomcat's point of view, the Pluto portal is an application. However, Pluto itself hosts portlet applications that may come from various sources. The question is how can Pluto release its resources allocated to a portlet async request if the portlet application misbehaves or has an exception and does not call complete()? 

My idea was that Pluto could register an AsyncListener of its own (in addition to any AsyncListeners possibly registered by the portlet applications) to release its resources during the onComplete call. But that only works if the underlying container (Tomcat, in this case) assures that onComplete is called under all circumstances.
Comment 6 Violeta Georgieva 2016-03-29 20:19:00 UTC
(In reply to Scott Nicklous from comment #5)
> I know what you mean and agree with you. 
> 
> From Tomcat's point of view, the Pluto portal is an application. However,
> Pluto itself hosts portlet applications that may come from various sources.
> The question is how can Pluto release its resources allocated to a portlet
> async request if the portlet application misbehaves or has an exception and
> does not call complete()? 
> 
> My idea was that Pluto could register an AsyncListener of its own (in
> addition to any AsyncListeners possibly registered by the portlet
> applications) to release its resources during the onComplete call. But that
> only works if the underlying container (Tomcat, in this case) assures that
> onComplete is called under all circumstances.

The idea here is that you should call AsyncContext.complete() in onTimeout event that your listener will receive instead of waiting for container to call AsyncContext.complete() and then onComplete().

In your example if I add AsyncContext.complete() in basic.servlet.TestedListener.onTimeout(AsyncEvent) then the third use case is working.

Do you think this will work for your scenario?

Regards,
Violeta
Comment 7 Scott Nicklous 2016-03-30 07:17:46 UTC
Hi Violeta,
That would be a potential workaround for the timeout case. I would have to ensure that the Pluto listener runs after all AsyncListeners registered by the portlet applications. 

It would be a little bit difficult to implement, since the servlet spec requires that the listeners be called in the order in which they are registered, and that's the way Tomcat works, too. So Pluto would not be able to simply call startAsync, add its listener to the AsyncContext, and pass the AsyncContext to the portlet application. I would probably have to wrap the AsyncContext object in order to make sure that the Pluto listener is always the last one added.

But it would be possible.
Comment 8 Violeta Georgieva 2016-03-30 07:22:57 UTC
If you wait for Tomcat to invoke a complete(), then this will be an error situation and as it is per spec Tomcat will set the response code to 500. Isn't that a problem from application point of view?
Comment 9 Scott Nicklous 2016-03-30 09:04:12 UTC
In the specific case addressed by this bug report, both onTimeout() and onComplete() are correctly called as long as the buffer is not flushed. 

The problem occurs only when the buffer was flushed at least once before the timeout occurs. If the buffer was flushed, onTimeout() is called, but onComplete() is not.

If the buffer was  flushed, I don't think Tomcat can set the status code or do an error dispatch anymore. But it would be nice if onComplete() would be called after the onTimeout() anyway.
Comment 10 Violeta Georgieva 2016-03-30 09:22:01 UTC
(In reply to Scott Nicklous from comment #9)
> In the specific case addressed by this bug report, both onTimeout() and
> onComplete() are correctly called as long as the buffer is not flushed. 
> 
> The problem occurs only when the buffer was flushed at least once before the
> timeout occurs. If the buffer was flushed, onTimeout() is called, but
> onComplete() is not.
> 
> If the buffer was  flushed, I don't think Tomcat can set the status code or
> do an error dispatch anymore. But it would be nice if onComplete() would be
> called after the onTimeout() anyway.

Actually it will as it is required by the spec

"
- Invoke the AsyncListener.onTimeout method on all the AsyncListener
instances registered with the ServletRequest on which the asynchronous
operation was initiated.
- If none of the listeners called AsyncContext.complete() or any of the
AsyncContext.dispatch methods, perform an error dispatch with a status
code equal to HttpServletResponse.SC_INTERNAL_SERVER_ERROR."

"GET /AsyncDebugListener/ltest?reps=3&timeout&flush HTTP/1.1" 500 642
"GET /AsyncDebugListener/ltest?reps=3&timeout HTTP/1.1" 500 624

This is what I will receive in the http access code with your example.

Regards,
Violeta
Comment 11 Mark Thomas 2016-03-30 09:29:18 UTC
It is worth taking a look to see if we can get consistent behaviour with and without flush. The client won't see the same result but hopefully the app will be able to see a consistent set of events.
Comment 12 Scott Nicklous 2016-03-30 10:04:18 UTC
I would not see a problem in handling the SC=500 as long as onComplete() is called. It would seem to me to be the specified behavior. 

In the case of Pluto, the SC=500 along with any Tomcat-generated error text would be passed to the client, which would give the portlet application developers a nice clue that they need to fix their bug. :-)
Comment 13 Mark Thomas 2016-04-19 18:53:18 UTC
Fixed in:
- 9.0.x for 9.0.0.M5 onwards
- 8.5.x for 8.5.1 onwards
- 8.0.x for 8.0.34 onwards
- 7.0.x for 7.0.70 onwards