Bug 57700

Summary: Exception in a LifecycleListener on Context makes webapp unreachable in spite of subsequent successful startups
Product: Tomcat 7 Reporter: Konstantin Kolinko <knst.kolinko>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 7.0.59   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: catalina.2015-03-13.log (censored)

Description Konstantin Kolinko 2015-03-13 13:30:08 UTC
Using Tomcat 7.0.59.
I have an implementation of LifecycleListener that is configured in META-INF/context.xml of a web application and performs some configuration work on CONFIGURE_START_EVENT.

Roughly:
[[[
import org.apache.catalina.Context;
import org.apache.catalina.Lifecycle;
import org.apache.catalina.LifecycleEvent;
import org.apache.catalina.LifecycleListener;

public class FooListener implements LifecycleListener {
  @Override
  public void lifecycleEvent(LifecycleEvent event) {
    if (event.getLifecycle() instanceof Context) {
      if (Lifecycle.CONFIGURE_START_EVENT.equals(event.getType())) {
        if (loader == null) {
          Context context = (Context) event.getLifecycle();
          try {
             // <Do some work...>
          } catch (Exception ex) {
            final String msg = "Configuration failed for web application \"" + context.getName()
                + "\".";
            throw new ConfigurationException(msg, ex);
          }
        }
      }
    }
  }
}
]]]

The work of the listener can fail due to some external condition.

Steps to reproduce:
All the following is performed via HTML UI of the Manager web application.
1. Undeploy application foo.
2. Upload foo.war.
3. The external condition is not configured.
The following is observed:
- The application fails to start.
- A org.apache.catalina.LifecycleException happens and is logged into catalina.dddd-yy-mm.log
- The Manager app shows the application as deployed and is in "stopped" state.

This all is as expected.

4. Fix the external condition and start the web application via Manager.
Observed behaviour:
- The application is started. The manager webapp shows that it is started and running.

Expected behaviour:
- Being able to access /foo

Actual behaviour:
- Error 404. The access log shows that the request was served by the ROOT web application.
- No errors or warnings are logged

If I stop/start the foo application, it stops and starts successfully, but nothing changes.
If I stop/start the ROOT application, it stops and starts successfully, but nothing changes.

If I undeploy and re-deploy the foo application, the situation resolves and the application can be accessed successfully.


Thoughts:
----------
1. Maybe the error is in Mapper. Maybe the error is handling of FailedContext. 

The initial error may be a deployment error (resulting in use of FailedContext), but in this case it is odd that starting the application (after fixing the external condition) is performed successfully and it is shown as running.

If the error is in the Mapper, why the error is fixed when the application is undeployed?

2. We already have test cases for failed startups,
org.apache.catalina.core.TestStandardContext
-> testBug46243()
-> testWebappLoaderStartFail()
-> testWebappListenerConfigureFail()

The testWebappListenerConfigureFail() test was designed to reflect this situation. I guess there is a difference in deployment scenario via Manager webapp vs. the one used in the test.

3. I wonder, if the Manager webapp had a "[Redeploy]" button for a webapp, like it has a "[Reload]" one, would pressing it be able to resolve this issue. I will file an enhancement request for that.
Comment 1 Konstantin Kolinko 2015-03-13 13:34:11 UTC
Created attachment 32565 [details]
catalina.2015-03-13.log (censored)

The catalina.$DATE.log file when the LifecycleException occurs during the first (failed) start of the web application. This is for Tomcat 7.0.59. I have censored non-Tomcat names.
Comment 2 Konstantin Kolinko 2015-03-13 13:54:07 UTC
(In reply to Konstantin Kolinko from comment #0)
> 3. I wonder, if the Manager webapp had a "[Redeploy]" button for a webapp,
> like it has a "[Reload]" one, would pressing it be able to resolve this
> issue. I will file an enhancement request for that.

Filed as bug 57701
Comment 3 Mark Thomas 2015-03-18 16:09:57 UTC
I can confirm that this does not affect 8.0.x or trunk.
Comment 4 Violeta Georgieva 2015-06-24 12:05:29 UTC
Hi,

I succeeded to reproduce the scenario on all Tomcat versions.

The issue is the following:
- The implementation of Manager web app invokes HostConfig to deploy the web application.
...
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:186)
	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:726)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:700)
	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:718)
	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:941)
...
- In ContainerBase.addChildInternal if the child's start fails then the event ADD_CHILD_EVENT is not sent.
- MapperListener will not be added to the context's listeners because it will not receive the event ADD_CHILD_EVENT
- Later on when one tries to start the context again MapperListener will not be notified.

The test testWebappListenerConfigureFail() is successfull because it explicitly sets 

((ContainerBase) tomcat.getHost()).setStartChildren(false);

with this flag when ContainerBase.addChildInternal is invoked, the child's start will be skipped and the event ADD_CHILD_EVENT will be sent.

What do you think if we move the notification for event ADD_CHILD_EVENT before children start? (see below)

Regards,
Violeta


Index: ContainerBase.java
===================================================================
--- ContainerBase.java	(revision 1686237)
+++ ContainerBase.java	(working copy)
@@ -714,6 +714,8 @@
             children.put(child.getName(), child);
         }
 
+        fireContainerEvent(ADD_CHILD_EVENT, child);
+
         // Start child
         // Don't do this inside sync block - start can be a slow process and
         // locking the children object can cause problems elsewhere
@@ -728,8 +730,6 @@
                     ("ContainerBase.addChild: start: " + e);
             }
         }
-
-        fireContainerEvent(ADD_CHILD_EVENT, child);
     }
Comment 5 Mark Thomas 2015-06-24 12:29:51 UTC
I'm wary of changing the order since it might break something else. How about a finally block?
Comment 6 Remy Maucherat 2015-06-24 12:38:07 UTC
I agree it will have side effects, like maybe mapping requests while the context is being deployed ? A finally seems better but even that would still change the current behavior (previously requests would never be mapped to the failed context), although maybe the right way.
Comment 7 Violeta Georgieva 2015-06-24 12:56:14 UTC
(In reply to Mark Thomas from comment #5)
> I'm wary of changing the order since it might break something else. How about a
> finally block?

+1

(In reply to Remy Maucherat from comment #6)
> I agree it will have side effects, like maybe mapping requests while the
> context is being deployed ? A finally seems better but even that would still
> change the current behavior (previously requests would never be mapped to
> the failed context), although maybe the right way.

The implementation is such that if the application is not started then MapperListener will just be added as Container and Lifecycle Listener but a registration of the context/wrappers will not happen. 
(org.apache.catalina.mapper.MapperListener.containerEvent(ContainerEvent)
row 148 & 151)
Comment 8 Remy Maucherat 2015-06-24 12:59:43 UTC
Would be fine then, but +1 for the finally.
Comment 9 Violeta Georgieva 2015-06-24 17:40:25 UTC
Fix is available in trunk, in 8.0.x for 8.0.24 and in 7.0.x for 7.0.63 onwards.