Bug 60623

Summary: When startStopThreads is 1, don't rely on an executor and instead start synchronously
Product: Tomcat 8 Reporter: John D. Ament <johndament>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 8.5.x-trunk   
Target Milestone: ----   
Hardware: Macintosh   
OS: Mac OS X 10.1   
Attachments: Proposed code changes (based on dev discussion)

Description John D. Ament 2017-01-21 21:07:38 UTC
The behavior right now in an embedded module is that if the startStopThread count is 1, it is still executed asynchronously.  Usually in an embedded use case, you only have a single executor, as a result you can't effectively wait for the container to start before moving on.  Other containers (Jetty, Undertow for instance) block on the main thread while the start up occurs (they start on the main thread).

This change would make it so that if the count is 1, just run on the main thread, instead of having an executor service run on start up.
Comment 1 Michael Osipov 2017-01-22 10:04:29 UTC
Alternatively, one can provide a #join() method just like Jetty does in Server.class.
Comment 2 John D. Ament 2017-01-22 14:09:00 UTC
Created attachment 34660 [details]
Proposed code changes (based on dev discussion)
Comment 3 John D. Ament 2017-01-22 14:09:55 UTC
I also created a PR not sure which would be easier. https://github.com/apache/tomcat85/pull/6
Comment 4 romain.manni-bucau 2017-01-22 14:22:32 UTC
Can 1 keep current behavior and 0 or negative values use an executor executing directly the task? 1 is used and relying on deployer threads avoids to leak data where this change would enable leakages for numerous cases like html deployer.

Side note: in the PR stop needs to execute returned runnables i think if kept this way
Comment 5 John D. Ament 2017-01-22 14:57:39 UTC
Romain, what would be the benefit of current behavior when threads == 1?  This is specific to helping fix a problem w/ Weld and Tomcat integration where a deadlock can occur because two threads are trying to gain a lock on the same object.

I think one of the ideas that came up on the discussion was to just introduce a flag. Mark seemed to think it was fine to just default to main.

Also note - the default behavior is that its going to use the thread count from the processor to make a judgment on the number of executors to use.  Unless you're explicitly overriding it to 1, you'll get the old behavior.
Comment 6 John D. Ament 2017-01-22 14:59:42 UTC
Just to clarify:

- If you don't specify the setting, or have it set to 0, you get the old behavior.
- If you set it to 1, you get the new behavior.
- If you set it to anything larger than 1, you get the old behavior.

Romain does raise a good point about the negative number, I suspect you'll get a runtime exception right now.
Comment 7 Mark Thomas 2017-01-23 09:54:22 UTC
(In reply to romain.manni-bucau from comment #4)
> Can 1 keep current behavior and 0 or negative values use an executor
> executing directly the task? 1 is used and relying on deployer threads
> avoids to leak data where this change would enable leakages for numerous
> cases like html deployer.

?

Please explain. The origins of the startStopThreads feature are not related to memory leak prevention.
Comment 8 romain.manni-bucau 2017-01-23 09:59:19 UTC
Fact is tomcat kind of guarantee you execute code in a specific thread pool (far from any user pool or http pool to be concrete).

So there is now a lot of code relying on deployment "initialization" setting there and not cleaning it (for good or bad reasons depending the listener implementation/hook). This is not an issue - assuming classes are coming from the "container" - since you are limited to this pool so it will be initialized each time and runtime will either not use that or get null and handle it (that's the 2 cases I'm aware of).

Add to that #pool=1 is used to avoid concurrent deployments and uresing this value to change the deployment mode will break instances.

Also just on a semantic point of view pool size = 1 sounds like a pool size is 1 and not 0 which correspond to "execute in context" no?
Comment 9 Mark Thomas 2017-01-23 20:35:07 UTC
Code that relies on the deployment mechanism for correct clean-up is broken and needs to be fixed. Deployment threads may be re-used before they are stopped and if any component fails to fully clean-up, the data could leak to the next deployment and potentially cause a security issue.

Re-defining a value of 1 for startStopThreads to mean 'serial deployment on the main processing thread' rather than 'serial deployment via a thread pool with a single thread' looks perfectly reasonable to me at this point.
Comment 10 romain.manni-bucau 2017-01-23 20:40:41 UTC
> Code that relies on the deployment mechanism for correct clean-up is broken and needs to be fixed. Deployment threads may be re-used before they are stopped and if any component fails to fully clean-up, the data could leak to the next deployment and potentially cause a security issue.

No cause it is only in these threads so this pattern works well without leaking:

    deploymentStart() { context.set(new DeploymentEn(...)); }
    deploymentStop() {}


Don't get me wrong, I don't like that code but saw it often and with tomcat behavior it was more than valid.

> Re-defining a value of 1 for startStopThreads to mean 'serial deployment on the main processing thread' rather than 'serial deployment via a thread pool with a single thread' looks perfectly reasonable to me at this point.

I think I didn't get what was the issue using 0 for that? It keeps compatibility and allow this new feature as well.
Comment 11 Mark Thomas 2017-01-23 20:52:36 UTC
You are going to have to explain that code sample. It means nothing to me.

0 and negative numbers are already defined to have special meaning.
Comment 12 romain.manni-bucau 2017-01-23 21:12:23 UTC
Ok, if you have a listener observing before_init or before_start on Context (or even org.apache.catalina.core.StandardHost#addChild) you can initialize some context there, if you only deploy in this pool then you are safe to set and not unset the value there.

I saw that several times in company integrations and I think it would be great to not break them. Also keep in mind 1 is the default.

Regarding 0, you are right. For the story it means "auto number" from the available processors which is something I missed.

Side note - unrelated to this ticket but still related to in context deployment: for the embedded case using tomcat.getHost().addChild(context) already deploys in current thread (that's what does Apache Meecrowave for instance). This also means org.apache.catalina.core.ContainerBase#initInternal could be lazy to avoid to create these useless threads.

Thinking more about it I think a clean solution would to just pass an executor name like for connectors and just lookup the executor from the server.xml definitions. If you don't want that pool deployment tomcat can provide an InCallerThreadExecutor implementing Executor interface.

wdyt?
Comment 13 romain.manni-bucau 2017-01-24 08:49:30 UTC
side note: to speak about well known frameworks affected by that, log4j2 can lead to the mentionned issue when added to the container since it sets a threadlocal in initialzer or context listener and reset it in a servlet (intended to be load on startup 1). Goal being to be initialized for the whole startup duration but if another filter fails then it is not resetted. While tomcat uses a pool this is not a big deal but when it will start to use the caller thread it can fail where it was previously ok. See log4j-web for the detail.
Comment 14 Mark Thomas 2017-01-25 20:29:19 UTC
Why isn't the log4j2 issue just as much as a problem for an Executor thread that is used to start multiple web applications?

I remain unconvinced that switching to the main thread when startStopThreads == 1 is any worse that the current behaviour.
Comment 15 Mark Thomas 2017-02-13 15:03:28 UTC
Given the concerns for existing applications - although I not convinced those concerns are entirely valid - I propose that this change is implemented for 9.0.x only and an appropriate entry added to the migration guide.
Comment 16 romain.manni-bucau 2017-02-13 15:16:12 UTC
acceptable for me, thanks Mark
Comment 17 John D. Ament 2017-03-06 03:13:30 UTC
I'm fine with that as well.  Do you need me to make any changes to the patch?
Comment 18 Mark Thomas 2017-03-06 08:58:43 UTC
Thanks for the offer.

I spend a little time looking at this over the weekend and I should have something ready to commit shortly.
Comment 19 Mark Thomas 2017-03-06 11:54:26 UTC
Fixed in 9.0.x for 9.0.0.M18 onwards