Bug 33711 - Memory leak (classloader) with Log4J and Single Sign On.
Memory leak (classloader) with Log4J and Single Sign On.
Status: RESOLVED FIXED
Product: Tomcat 5
Classification: Unclassified
Component: Catalina
5.5.9
PC Windows XP
: P2 normal (vote)
: ---
Assigned To: Tomcat Developers Mailing List
: PatchAvailable
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2005-02-23 15:25 UTC by Kev Palfreyman
Modified: 2005-04-06 14:50 UTC (History)
3 users (show)



Attachments
Fix session leak in SingleSignOn (5.11 KB, patch)
2005-03-01 13:33 UTC, Kev Palfreyman
Details | Diff
Call recycle on all sessions (passivated or expired) on stop (1.29 KB, patch)
2005-03-29 16:39 UTC, Remy Maucherat
Details | Diff
Always recycle sessions; add activate/passivate events and use passivate for SSO (5.45 KB, patch)
2005-03-30 15:03 UTC, Remy Maucherat
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kev Palfreyman 2005-02-23 15:25:31 UTC
We have detected a situation where Tomcat will leak memory when web 
applications are undeployed.

Specifically, when the following 3 conditions are met:
a) SingleSignOn valve is used
b) the web app requires the user to be authenticated to get a session
c) the web app contains commons-logging.jar *and* log4j.jar

This has been observed with Tomcat 5.5.7

Simply deploy a web app under the above conditions, visit the web app so
that a session for that app is stored in the SingleSignOn data structures,
then undeploy the web app.  At this point tomcat leaks a reference to the
class loader for the undeployed web app.  Repeat, and eventually this will
cause OutOfMemoryError.

What happens is that an instance of StandardSession is stored as a key 
in the "reverse" HashMap of SingleSignOn. The instance of StandardSession 
also refers to an instance of StandardManager in its "manager" field.
The classes for the SingleSignOn, StandardSession and StandardManager all 
appear to be loaded by the "catalina" StandardClassLoader.

However, (and I think this is where the leak happens), the "log" field
of the StandardManager refers to an instance of:
org.apache.commons.logging.impl.Log4JLogger

That Log4JLogger class is loaded by the WebAppClassLoader for our web
app, hence making a hard reference to that class loader.

When the web application is undeployed, the hard reference to the dead
classloader remains in the data structures of SingleSignOn.  I think it
only gets cleaned up when the session is terminated.

This has lead to exhaustion of the PermGen for our applications, where
it is quite normal to create/deploy/undeploy web applications in 
production (not just during development).

I can supply a very simple web-app to demonstrate this effect if required.

Not sure if this is one bug, or several:
1) Why is the logger loaded from the web app class loader?
2) Why does the SingleSignOn mechanism not remove ALL sessions for ALL users
related to a given web app when the web app is undeployed.


Any ideas for fixes for this?  (Its seriously delaying a product release 
at present.)

Many thanks,

	Kev
Comment 1 Remy Maucherat 2005-02-23 15:41:44 UTC
You might want to look into the code a little bit more, and see that SSO should
get an event for all sessions of a particular webapp when it is stopped. If you
think there are cases where something is not properly cleaned up, then please
submit a patch.
Comment 2 Kev Palfreyman 2005-02-24 17:11:24 UTC
(In reply to comment #1 from Remy)
> You might want to look into the code a little bit more, and see that SSO 
should
> get an event for all sessions of a particular webapp when it is stopped. 

Remy,
OK I'm confused...  I've only just started to look at the internals of Tomcat, 
so its kind of hard to initially get my head around the codebase.
However, I've been instrumenting a local copy of the 5.5.7 source with lots of 
extra logging to try to follow the logic.

From what I can tell, the SingleSignOn is NOT notified in any way when an 
application is undeployed.  The only mechanism I could see would be if it was 
told when the individual sessions were expired.  So I had a look at 
session.StandardManager, which seems to deal with persisting and expiring all 
of the sessions when an application is STOPped. However, looking at 
StandardManager.doUnload(), the look that expires the sessions calls
session.expire(false) - i.e. does not notify listeners, hence the SSO never 
gets told.

Did I miss something obvious? (I am trying to look into the problem and help 
out here!)

In the meantime, a colleague is still fighting with the loggers and class 
loaders, but I think we can accept that is a separate issue.

Thanks,

   Kev
Comment 3 Remy Maucherat 2005-02-24 17:56:16 UTC
Yes, that's a very good point about the expire(false) when a webapp is stopped.
I didn't think about that (to be honest, I don't know the SSO code that well
either, but looking at the code I had the impression that this was working, and
the problems would be caused by something else).

I'll try to see if there is a way to do better cleanup, or to remove the need
for holding references to sessions.
Comment 4 Kev Palfreyman 2005-02-25 12:11:06 UTC
I tried the obvious quick solution of changing expire(false) to expire(true), 
but unfortunately that had a nasty side-effect.  If the user is logged in to a 
SSO session, and is using several web apps, then if one of those web apps is 
undeployed, their *entire* SSO is destroyed. This means they are immediately 
logged out (of all the apps), and their next interaction with any of the other 
web apps will force them to log in again (which is obviously not what was 
intended).

I'll keep digging and trying to follow the code base.  (Out of interest, does 
all of the code have as few comments as this session/authentication stuff?)

We've downgraded our upcoming product release to a beta (and use a single 
shared log instead of per webapp logging) until we figure a way to get the SSO 
to release the ref to the classloader.

Thanks again, and please do let us know if you have any ideas for how to 
change this SSO behaviour.  
BTW - Is there a tomcat test suite I should be using so that I know if I have 
broken anything while I look at this issue? Being new to the tomcat code, I'm 
quite wary of breaking stuff via strange side-effects.

Cheers,

Kev


Comment 5 Kev Palfreyman 2005-02-26 00:28:13 UTC
I had a bit more of a look at this. Again, please be aware that I am new to 
this code base. 

The only mechanism I can see for an instance of SingleSignOn to be told that a 
WebApp has been undeployed is if it were to be told when the sessions related 
to that app were destroyed (i.e. if StandardManager.doUnload() were to be 
changed to call expire(true) instead of expire(false)).  However, the listener 
SingleSignOn.sessionEvent(SessionEvent) assumes that it is only ever called 
for one of two reasons - either the user logged out, or the session expired. 

But what we have is a third case - one of the apps being undeployed.  

Now, it seems that if we can detect the third case then the method to be 
called has already been implemented (if not tested)... The method 
SingleSignOn.deregister(ssoId, session) [note the 2nd param] already exists 
and looks like it is designed for the job, but I can't find anything that 
calls it!

So the question is: How does the listener method detect the third case? I can 
think of hacks involving overloading the expire(notify) method with an extra 
param and using that to put some marker in the data field of the SessionEvent, 
but I can't help but think I'm missing something obvious by not knowing the 
code base.

Ideas?

Kev
Comment 6 Kev Palfreyman 2005-02-28 18:55:12 UTC
> So the question is: How does the listener method detect the third case? I 
can 
> think of hacks involving overloading the expire(notify) method with an extra 
> param and using that to put some marker in the data field of the 
SessionEvent, 
> but I can't help but think I'm missing something obvious by not knowing the 
> code base.

I can't find a cleaner way, but I have this work using the above technique.
Should I provide a patch against the 5.5.7 src (which is what I'm using), or 
do patch have to be against the CVS head?
I've not looked at other managers such as Persistent / clustering etc, as I 
don't know much about those.

Kev
Comment 7 Kev Palfreyman 2005-03-01 13:33:31 UTC
Created attachment 14379 [details]
Fix session leak in SingleSignOn

Since there was no answer to how you wanted the patch, I have a patch against
5.5.7.	Its a unified diff of everything in:
jakarta-tomcat-5.5.7-src\jakarta-tomcat-catalina\catalina\src\share\

Its pretty small - changes to 4 files:
catalina\Session.java
catalina\authenticator\SingleSignOn.java
catalina\session\StandardSession.java
catalina\session\StandardManager.java

Regards,

Kev
Comment 8 Remy Maucherat 2005-03-29 14:35:05 UTC
Now that 5.5.9 is released, I can look into more complex/risky issues.

I see you have done a lot of research, which identifies the problem very
clearly. However, I don't really like the proposed solution. How about simply
setting the manager field to null since this is the cause of the problem ?
Calling recycle on all sessions in StandardManager.stop could do this easily.

Index: StandardManager.java
===================================================================
RCS file:
/home/cvs/jakarta-tomcat-catalina/catalina/src/share/org/apache/catalina/session/StandardManager.java,v
retrieving revision 1.28
diff -u -r1.28 StandardManager.java
--- StandardManager.java	7 Feb 2005 21:56:32 -0000	1.28
+++ StandardManager.java	29 Mar 2005 12:32:32 -0000
@@ -670,12 +670,16 @@
         Session sessions[] = findSessions();
         for (int i = 0; i < sessions.length; i++) {
             StandardSession session = (StandardSession) sessions[i];
-            if (!session.isValid())
-                continue;
             try {
-                session.expire();
+                if (session.isValid()) {
+                    session.expire();
+                }
             } catch (Throwable t) {
                 ;
+            } finally {
+                // Measure against memory leaking if references to the session
+                // object are kept in a shared field somewhere
+                session.recycle();
             }
         }
 
Comment 9 Remy Maucherat 2005-03-29 16:39:42 UTC
Created attachment 14584 [details]
Call recycle on all sessions (passivated or expired) on stop
Comment 10 Kev Palfreyman 2005-03-30 12:25:38 UTC
(In reply to comment #8)
Hi Remy,
Thanks for coming back to this issue - the memory profile of Tomcat during
repeated web application deploy/undeploy/redeploy cycles on PRODUCTION (no 
dev) servers is a big issue for us.  Your efforts are much appreciated.

> Now that 5.5.9 is released, I can look into more complex/risky issues.

Is there a long gap expected then before a 5.5.10?  Currently I think we're 
shipping a modified 5.5.7 (last stable, plus patches - all noted in bugzilla).

> I see you have done a lot of research, which identifies the problem very
> clearly. However, I don't really like the proposed solution. How about simply
> setting the manager field to null since this is the cause of the problem ?
> Calling recycle on all sessions in StandardManager.stop could do this easily.

Maybe BOTH approaches are required?  I don't have my test setup from back when 
I was looking at this now, but doesn't your proposed solution still leave a 
reference to the (now recycled) session object in the reverse map in 
SingleSignOn?

I could be missing something - its been a while since we looked at this 
particular problem.

[off topic 1] At the time I was evaluating a commercial memory profiling tool -
 do you have any recommendations?

[off topic 2] Any chance of eclipse .project and .classpath files shipping in 
the source distribution? 

Thanks,

Kev



Comment 11 Remy Maucherat 2005-03-30 13:41:59 UTC
To be honest, I am not actually interested in fixing your particular problem. I
was thinking first about the generic problem where references to sessions are
kept for some reason by any component. So the feedback I was most interested in
is: does this fix the reference leak to the classloader ?

The design flaw is then in the SSO reverse map if you want to avoid leaking
skeleton objects. One easy solution might be to change it to a WeakHashMap.

I do not have a good memory profiler to recommend (I don't like any at the moment).
Comment 12 Jess Holle 2005-03-30 14:18:46 UTC
You might want to give the memory profiler in NetBeans 4.x a try.

OptimizeIt has some advantages over this, but NetBeans is free and OptimizeIt...
isn't.
Comment 13 Remy Maucherat 2005-03-30 15:03:57 UTC
Created attachment 14593 [details]
Always recycle sessions; add activate/passivate events and use passivate for SSO
Comment 14 Remy Maucherat 2005-03-30 15:08:05 UTC
New cumulative patch now addressing the SSO problem. Please test it (and fix it
if there are problems).
Comment 15 Remy Maucherat 2005-04-05 11:56:23 UTC
I think this has been fixed. Too bad there's no feedback.
Comment 16 Kev Palfreyman 2005-04-05 13:32:16 UTC
(In reply to comment #15)
> I think this has been fixed. Too bad there's no feedback.

I'm looking to verify this this week.  Looks good so far but didn't want to 
say yes until I'm certain.

K.
Comment 17 Kev Palfreyman 2005-04-06 16:43:07 UTC
I had a another look into this, and seems better, but I'm not completely 
convinced that it is entirely fixed.  I can't pin down exactly what is 
happening, but looks like the very first deployment does not fully undeploy 
(release all references), but subsequent cycles appear to work correctly.  The 
reason it is odd is because after the first cycle my profiler does not list 
my "BigServlet" class, but the 20MB it allocates never appears to be released 
(but for each later cycle the graph shows their heap allocations are released).

So, sorry but I can't really give a definite answer.  Yes it is a big 
improvement, but no I can't say it is entirely fixed.

My cycle is:
1) visit some application that does a sign-in for SSO
2) use the manager to list the apps
3) memory profiler snapshot
4) drop a leak.war in the web apps folder (auto deploy)
5) visit leak with a browser
6) undeploy leak using manager web app
7) repeat from 2 several times, and compare snaphots as well as graphing 
various memory stats and heap, etc.

Thanks for all the efforts!

Kev
Comment 18 Remy Maucherat 2005-04-06 17:37:10 UTC
What did you test exactly (CVS version ?) ? Still using log4j for everything ?
Do you have any idea on where the reference to the classloader would be kept ?
Comment 19 Kev Palfreyman 2005-04-06 17:58:11 UTC
(In reply to comment #18)
> What did you test exactly (CVS version ?) ? Still using log4j for 
everything ?
> Do you have any idea on where the reference to the classloader would be 
kept ?

I updated to CVS head just after you committed the changes (31/3 maybe? can't 
remember).

The content of my test WAR is:
>jar -ft leak.war
index.jsp
META-INF/context.xml
META-INF/MANIFEST.MF
META-INF/
WEB-INF/classes/
WEB-INF/classes/BigServlet.class
WEB-INF/lib/
WEB-INF/lib/commons-logging.1.0.3.jar
WEB-INF/lib/log4j.1.2.8.jar
WEB-INF/web.xml
WEB-INF/


Context.xml is simply:
<Context antiResourceLocking="true" antiJARLocking="true" >
</Context>


BigServlet simply has a static byte[] to allocate 20MB, initialised in the init
() method, and also logs a message.
Web.xml simply specifies it is loaded on startup, and that everything in the 
web app required the user to be in a given role.  The JSP simply displays the 
remoteUser and SessionID.

[Note: this configuration above is just for my test web app and we try to work 
around it.  For production we include the Log4J jar only in our web app, and 
then use separate jdk logging for anything that is done through commons-
logging.]

As for ideas as to what is holding the reference, no I don't have ideas.  The 
profiler I'm currently evaluating isn't helping me there :-(  The best I used 
so far (and was using back when I first reported) was YourKit4.0, but my eval 
has expired.

K.