Bug 58900 - Infinite redeploy loop with undeployOldVersions="true"
Infinite redeploy loop with undeployOldVersions="true"
Product: Tomcat 8
Classification: Unclassified
Component: Catalina
PC Mac OS X 10.1
: P2 normal (vote)
: ----
Assigned To: Tomcat Developers Mailing List
Depends on:
  Show dependency tree
Reported: 2016-01-20 22:33 UTC by Lauri Lehtinen
Modified: 2016-01-28 08:45 UTC (History)
1 user (show)

JConsole screen while observing this behavior (121.87 KB, image/png)
2016-01-20 22:33 UTC, Lauri Lehtinen
catalina.out with HostConfig logging set to finest (69.04 KB, text/plain)
2016-01-20 22:37 UTC, Lauri Lehtinen

Note You need to log in before you can comment on or make changes to this bug.
Description Lauri Lehtinen 2016-01-20 22:33:34 UTC
Created attachment 33471 [details]
JConsole screen while observing this behavior

## Reproduction steps ##

To reproduce this issue:

1. Add undeployOldVersions="true" to the otherwise default Host configuration in server.xml

2. Do a parallel deployment of two versions of the same war by creating a symlink to the actual war file(s):

$ ln -s /path/to/app.war /path/to/webapps/app#001.war
$ ln -s /path/to/app.war /path/to/webapps/app#002.war

Shortly after startup, Tomcat undeploys app##001 (the "old version"). However, the war file is not deleted, and ends up getting redeployed. This happens over and over again, until at some point the number of loaded classes grows too big and OutOfMemoryError happens.

## Root cause ##

What's preventing the symlinked war from getting deleted is this line in org.apache.catalina.startup.HostConfig#deleteRedeployResources:

current = current.getCanonicalFile();

The canonical file is war file the symlink is pointing to, and the isDeletableResource method determines that it should not be deleted.

## Context ##

This issue has been plaguing me on a CentOS/6 server running 7.0.55, but appears to affect 8.x and 9.x as well. I reproduced it locally on OSX. My Struts2/Spring/Hibernate application died within ~10 minutes with -Xmx256m. Screenshot from jconsole attached, undeploy/redeploy cycle can be seen clearly in the CPU usage and number of classes loaded.
Comment 1 Lauri Lehtinen 2016-01-20 22:37:53 UTC
Created attachment 33472 [details]
catalina.out with HostConfig logging set to finest
Comment 2 Lauri Lehtinen 2016-01-20 22:51:02 UTC
Easy fix for this particular problem would be to remove the getCanonicalFile call in HostConfig (https://github.com/apache/tomcat/blob/trunk/java/org/apache/catalina/startup/HostConfig.java#L1416), but I'm afraid it's there for some good reason.
Comment 3 Christopher Schultz 2016-01-21 02:03:31 UTC
I think the root issue is that your application leaks a ClassLoader for some reason (and you should definitely fix that), but Tomcat should also not be endlessly re-deploying itself.

There is a less pathological use-case here: downgrading an application after pushing-out a broken update. For example, version A is deployed and working well, then version B is deployed. Version B is determined to be flawed and so version A will be re-deployed as version C. Symlinks are used to link version C to version A for clarity for an admin. Assuming versions A and C are deployed simultaneously (e.g. version A was not decommissioned before version C was deployed) and you have the same situation as described in comment #0.
Comment 4 Mark Thomas 2016-01-21 10:12:14 UTC
+1 to Chris's comment regarding the memory leak. 99% of those turn out to be bugs in applications or 3rd party libraries. The redeploy loop is definitely a Tomcat bug.

getCanonicalPath() is used handle docBases such as "../../someapp.war" which will be resolved relative to the appBase but should not be deleted because they are outside the appBase. In this case the context would probably have been configured via a someapp.xml file (which does need to be removed).

There are likely to be similar issues with any code that triggers an undeploy if the WAR, DIR or XML files are symlinked into the appBase or configBase. The fix will be to remove those symlinks. The checks are going to have to be more sophisticated than the current getCanonicalPath().

I plan to look at this today.
Comment 5 Lauri Lehtinen 2016-01-21 10:17:33 UTC
You're right, the memory leak was caused by separate issue. Thanks
Comment 6 Mark Thomas 2016-01-28 08:45:14 UTC
Fixed for 9.0.0.M2 onwards, 8.0.31 onwards and 7.0.68 onwards.