Bug 41939 - NPE in Logging due to classloading
Summary: NPE in Logging due to classloading
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.5.20
Hardware: Other All
: P2 normal with 2 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 37956 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-03-23 14:03 UTC by Thomas Strauss
Modified: 2008-07-09 14:01 UTC (History)
2 users (show)



Attachments
Simple War File That Demonstrates NPE Behavior on Reload (459.75 KB, application/octet-stream)
2007-04-19 06:53 UTC, Robert Weber
Details
This WAR Contains a Modified Log4j That Does NOT Exhibit NPE Behavior (475.03 KB, application/octet-stream)
2007-04-19 06:59 UTC, Robert Weber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Strauss 2007-03-23 14:03:51 UTC
Actually this Bug happens also in 5.5.23. It does not happen in 5.5.9.

Ok, we have a webapplication that contains each and every lib it will ever
require. Due to some prepackaged JAR-files it even contains log4j and
commons-logging twice. One in WEB-INF/lib and one inside a jar in WEB-INF/lib.

There is a commons-logging (1.0.4) in commons/lib of tomcat, but not log4j.

We use Anti-Jar|Resource-Locking on Windows.

When deploying an application using the manager or the ant-tasks, everything is
fine. Also removing and deploying a new version works.

If we reload the application, it will refuse to reload with an NPE. 

I tracked the application source in the debugger and trapped the NPE. I have
never seen this, but this is the fact:

I can see the complete call stack and inside Log4j the function converter.format
is called with a valid reference to a StringBuffer which suddenly becomes "null"
in the static method PatternConverter.spacePad()!

I can only explain this with the hierarchical Classloader mixing classes and
beeing unable to transport the reference to the StringBuffer from the instance
that is loaded in another class to the Classloader carriing the static
PatternConverter class. 

I could "solve" this issue by putting the log4j.jar in the common/lib directory
and restart the server. If I remove it from there or if I remove it completely
from my application (also from jar files in my application) it works. 

Best regards, 

Thomas


Stacktrace follows: Exception on reload in /manager

[23 Mrz 22:01:59:139][INFO ][closing any session, if still open
][de.srs.pen.vwopenday2006.StrutsServlet]
[23 Mrz 22:01:59:139][DEBUG][closeSessionSilent on sid 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][Initializing Thread Local for refcounting on 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][closing session, removing thread locals
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][Resetting refcounter on session 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][reset counter 0 to 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][checking for leftover transaction
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][Resetting refcounter on Transaction 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][Initializing Thread Local for refcounting on 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][DEBUG][reset counter 0 to 0
][de.srs.pen.dao.DAOSessionFactory]
[23 Mrz 22:01:59:139][INFO ][Destroy complete
][de.srs.pen.vwopenday2006.StrutsServlet]
[23 Mrz 22:01:59:139][DEBUG][JspServlet.destroy()
][org.apache.jasper.servlet.JspServlet]
[23 Mrz 22:01:59:139][DEBUG][Stopping filters
][org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/SurveyDay]]
[23 Mrz 22:01:59:139][DEBUG][Stopping ][org.apache.catalina.session.ManagerBase]
[23 Mrz 22:01:59:139][DEBUG][Unloading persisted sessions
][org.apache.catalina.session.ManagerBase]
[23 Mrz 22:01:59:139][DEBUG][Saving persisted sessions to SESSIONS.ser
][org.apache.catalina.session.ManagerBase]
[23 Mrz 22:01:59:139][DEBUG][Unloading 0 sessions
][org.apache.catalina.session.ManagerBase]
[23 Mrz 22:01:59:139][DEBUG][Expiring 0 persisted sessions
][org.apache.catalina.session.ManagerBase]
[23 Mrz 22:01:59:139][DEBUG][Unloading complete
][org.apache.catalina.session.ManagerBase]
23.03.2007 22:02:00 org.apache.commons.modeler.Registry registerComponent
SCHWERWIEGEND: Null component
Catalina:type=JspMonitor,name=jsp,WebModule=//localhost/SurveyDay,J2EEApplication=none,J2EEServer=none
23.03.2007 22:02:00 org.apache.catalina.core.ApplicationContext log
SCHWERWIEGEND: HTMLManager: ManagerServlet.reload[/SurveyDay]
java.lang.NullPointerException
 at org.apache.log4j.helpers.PatternConverter.spacePad(PatternConverter.java:92)
 at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:71)
 at org.apache.log4j.PatternLayout.format(PatternLayout.java:495)
 at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:292)
 at org.apache.log4j.WriterAppender.append(WriterAppender.java:150)
 at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:221)
 at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:57)
 at org.apache.log4j.Category.callAppenders(Category.java:187)
 at org.apache.log4j.Category.forcedLog(Category.java:372)
 at org.apache.log4j.Category.log(Category.java:864)
 at org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:110)
 at org.apache.catalina.session.ManagerBase.init(ManagerBase.java:723)
 at org.apache.catalina.session.StandardManager.start(StandardManager.java:619)
 at org.apache.catalina.core.StandardContext.start(StandardContext.java:4166)
 at org.apache.catalina.core.StandardContext.reload(StandardContext.java:3025)
 at org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:907)
 at
org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:488)
 at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:99)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
 at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
 at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
 at java.lang.Thread.run(Unknown Source)
23.03.2007 22:02:00 org.apache.catalina.core.ApplicationContext log
INFO: HTMLManager: list: Listing contexts for virtual host 'localhost'
Comment 1 Yoav Shapira 2007-03-25 13:51:48 UTC
I doublt we should adjust very well-tested classloader because your packaging is
messed up.  It's your responsibility (or the responsibility of whoever packages
your webapp) to not include duplicate and/or conflicting versions of its
dependent libraries.

If you can't do this, at the very least you will want to separate Tomcat's
logging libs from your own, perhaps by moving the commons-logging stuff from
$CATALINA_HOME/common/lib to $CATALINA_HOME/server/lib.
Comment 2 Thomas Strauss 2007-03-26 03:42:08 UTC
Then: Why does the very same package deploy on 5.5.9?

And: Yes there are duplicate classes in the packages, but in the end you will
never be able to avoid it. 

And: There is the, mabe even written law, that it is definitely good practice
that one should package all required classes inside the web-app\lib path,
besides native lib adapters. If this is suddenly not true any more, then how do
you deploy different independent applications with incompatible commons-logging
implementations? 

The answer may be easy for you, but I think that I have done valid packaging and
that the classloader is in fact mixing packages from /common/lib with packages
from my application web-inf/lib on application reload.

It looks like it is preferring the common/lib instance of the jar on reloading,
not recognizing, that there is a jar in WEB-IN/lib carrying the same class.
Following the rules of classloading, the Web-App's jar should be used. 

But you are free to correct me if I am wrong.
Comment 3 Robert Weber 2007-03-26 06:38:42 UTC
This bug should not be dismissed. My own web application includes one instance
each of commons-logging-1.1.jar and log4j-1.2.14.jar. My
$CATALINA_HOME/common/lib does not contain a commons-logging jar (unless the
class files are embedded in one of the other jars???).

Additionally, the NPE is thrown by log4j code. In fact, I have done some
debugging on the log4j code-base, and the behavior that I witnessed defied
standard java convention. That is, the NPE is caused by certain static class
fields (whose values are set at instantiation) arbitrarily becoming null when
one attempts to reload the context that contains the log4j jar. In effect,
certain static members of the log4j classes are not being re-initialized when
the context is reloaded.

For my part, I have worked around the issue by patching my log4j to
re-initialize the problem fields itself when they are detected to be null, but
in all honesty, this hack should be unnecessary.

Now, I would not say that this bug is a number one priority show-stopper (or
even that it should be fixed), but it should, at least, be acknowledged.
Comment 4 Shankar Unni 2007-03-26 17:12:10 UTC
You do remember that when you undeploy an application (while restarting it), the
tomcat classloader reflects on all the classes it's unloading, and resets any
static members to null.

Depending on the order in which it does this, if there are some other threads
still running, they can get NPEs..
Comment 5 Robert Weber 2007-03-27 02:31:56 UTC
Why should static members be reset to null? When static members are reset to
null, do they eventually get re-initialized? For instance if I declare:

static String foo = "bar";

When the context re-load completes, does "bar".equals(foo) evaluate to true?
Comment 6 Thomas Strauss 2007-03-27 02:34:30 UTC
I have no additional threads running in this application. Besides that, in all
applications we manage, the running threads are stopped on destroy(). 

But neverthelesse: I have started to move JARs around and have now the following
situation: 

Like Robert Weber, I have no Log4J and no commons-logging in the commons/lib or
in shared/lib. 

I have a logging.properties in commons/classes.

I have commons-loggings and log4j only ONCE in the webapplication this is tested
with. 

UNloading works fine, as soon as reloading starts, the NPE is thrown. This time
it is not even inside the application, but in Tomcat Manager. I guess, that the
class in log4j is already distorted. 

Could it be that tomcat or commons-logging thinks it should use log4j because
the classes are found inside the application?
Comment 7 Mark Thomas 2007-04-09 16:36:38 UTC
The issue is that something is keeping a reference to a logger that it
shouldn't. This is going to cause a memory leak and what you are seeing are the
side-effects of Tomcat's attempts to prevent memory leaks.

If you can provide the simplest possible test war the exhibits this issue I'll
happily debug my way though the code and figure out what is going wrong. Without
such a test case, there is little that can be done to investigate this issue.
Comment 8 Robert Weber 2007-04-19 06:53:34 UTC
Created attachment 20001 [details]
Simple War File That Demonstrates NPE Behavior on Reload

I deployed this WAR on the 5.5.20 and 5.5.23 binary distributions.

This WAR contains a simple servlet that contains a static reference to a Log
instance. The vanilla binary of log4j 1.2.14 is on the classpath, so
commons-logging is using the log4j log provider.

The servlet runs as expected (it just prints a message to the client and prints
the same message in the log). However, when I click on the reload link for the
application in the manager web application, the context fails to reload with a
NullPointerException.

The NullPointerException traces back to log4j classes. Specifically, I
discovered that there are several instances where static members of the log4j
classes in question are improperly set to null by something outside of log4j
and not re-initialized.
Comment 9 Robert Weber 2007-04-19 06:59:22 UTC
Created attachment 20002 [details]
This WAR Contains a Modified Log4j That Does NOT Exhibit NPE Behavior

This WAR is identical to "20001: Simple War File That Demonstrates NPE Behavior
on Reload" except that it contains a version of log4j that I modified.

Recall that I stated that there are several static members in log4j classes
that are being improperly nullified and not re-initialized. The modifications
that I made to the log4j binary included in this WAR simply amount to checking
if the members in question are in a null state before using them. Then, if they
are null, I re-initialize the members to their starting values.

This WAR does NOT exhibit a NullPointerException when reloaded. Additionally,
the servlet continues to function normally after several reloads.
Comment 10 Remy Maucherat 2007-04-19 07:03:42 UTC
Set the system property
org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES to false.
Comment 11 Robert Weber 2007-04-19 07:11:09 UTC
I attached two test cases. The first contains a vanilla binary of log4j 1.2.14,
and it does exhibit the NPE behavior. The second contains a modified binary of
log4j 1.2.14, and it does NOT exhibit the NPE behavior. More details regarding
the  test cases can be found by clicking on the attachment links.

Two further notes, though:

1.) The NPE behavior is exhibited in the first test case regardless of whether
the servlet containing the log reference has been previously invoked.

2.) The first test case does NOT exhibit the NPE behavior under tomcat 6.0.10. I
suspect that this has something to do with the standardization of logging under
that tomcat platform.(In reply to comment #10)

> Set the system property
> org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES to false.

Why?
Comment 12 Robert Weber 2007-04-19 08:00:40 UTC
(In reply to comment #10)
> Set the system property
> org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES to false.

I set this property in $JAVA_OPTS to false. I additionally verified that the
property was, in fact, being set by querying System.getProperty().

However, setting this property to false has no effect with regards to resolving
this bug.
Comment 13 Remy Maucherat 2007-04-19 08:05:05 UTC
Ok, you did not post the NPE, so I cannot know for sure. Regardless, this issue
is not going to be addressed, and you should probably be packaging your web
application differently.
Comment 14 Robert Weber 2007-04-19 08:29:36 UTC
(In reply to comment #13)
> Ok, you did not post the NPE, so I cannot know for sure. Regardless, this issue
> is not going to be addressed, and you should probably be packaging your web
> application differently.

There is little variation between my NPE and the one that has been posted
previously.

By "packaging your web application differently", what exactly do you mean?
Placing log4j in the container classpath instead of the application classpath?
Not using a separately packaged commons-logging? Not using log4j at all?
Comment 15 Mark Thomas 2007-04-19 09:14:10 UTC
org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES only has an
effect in the 6.0.x branch. This option hasn't been ported to the 5.5.x branch.
Comment 16 James McIntosh 2007-04-19 16:49:36 UTC
I am hunting for permgen leaks as the moment and have wildly inconsistent
results with context reloading. Most of the time the first loaded webapp will
not garbage but some subsequent reloads are grabaged.

The Tomcat FAQ section should have it spelled out under a title called 'Log4J'
the best way to include log4j and the implications of having the
commons-logging. Currently the 'How should I log in my own webapps?' section
'strongly recommends' using log4j but does not address any of these issues.
There are many different approaches listed on the web some of which go as far as
replacing the bootstrap logging.

e.g.

/commons/lib/commons-logging.jar
/commons/lib/log4j.jar
/commons/classes/log4j.properties

/<APP>/WEB-INF/lib/log4j.jar
/<APP>/WEB-INF/lib/commons-logging.jar
/<APP>/WEB-INF/classes/log4j.properties

It should spell out things like the following:

If you have commons-logging.jar in your webapp then it will not garbage.

If you don't have log4j.jar in your webapp then you will not get per-content
logging of you /<APP>/classes/log4j.properties file

Kind regards
James McIntosh
Comment 17 Mark Thomas 2007-04-19 19:59:47 UTC
I am re-opening this since the bug is the result of the nulling out static and
final fields from classes loaded by the web app classloader. As noted in the
code comments this is a workaround for some apparent GC bugs.

The workaround will not be removed since it is useful for many users and has
been shown to resolve some memory leak issues.

I will make this workaround optional in TC5.5.x, as it is in TC6.0.x.
Comment 18 Mark Thomas 2007-04-19 20:00:53 UTC
*** Bug 37956 has been marked as a duplicate of this bug. ***
Comment 19 Mark Thomas 2007-04-19 20:20:27 UTC
I have added this configuration option to svn. The provided test case no longer
throws an NPE on reload, providing the new configuration option is set to false.

This will be included in 5.5.24 onwards.
Comment 20 Mark Thomas 2007-04-22 17:34:38 UTC
Note that after some further investigation the root cause is actually a logging
related memory leak (bug 42172). This leak has been fixed in svn and the test
case for this issue now runs without an NPE with or without setting the system
property
org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES to false.
Comment 21 Mark Thomas 2007-04-23 19:43:08 UTC
Having looked at this in more detail there is still a theoretical possibility of
an NPE unless the system
property
org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES is set to false

Any such occurrence should be rare.
Comment 22 Aaron Digulla 2007-11-02 13:27:46 UTC
I'm wary of the implications of the comments in this bug. Let me summarize to
make sure we're all on the same level. I'm coming from bug 40212, btw.

- To get per-app logging, one has to use the setup of comment #16 (one instance
of log4j and commons-logging per tomcat and per webapp).

- If you do this, you can get NPEs at the next startup because Tomcat clears
static fields during undeploy.

Can you explain to me how it can be possible that the instance N+1 of my webapp
can see anything from instance N? If that would happen, it would either mean 

a) that tomcat is handing me classes which are in the common classpath despite
them being overridden in my webapp
b) that I get references to classes which ought no to exist anymore

a) means that classes from tomcat leak into my webapp and b) means that classes
from instance N can leak into N+1. Neither must be possible.
Comment 23 Dan Armbrust 2008-07-09 13:40:47 UTC
Could someone with knowledge of this bug read through this comment on a bug in Log4j?

https://issues.apache.org/bugzilla/show_bug.cgi?id=43867#c39

I believe that this is an example of tomcat breaking other code due to the way that it clears references, when org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES is left to its default of "on".

In this case, log4j prevents a null pointer, and simply logs an error.  But other code could very easily take a null pointer for the same reason.

It doesn't seem like the current implementation of clear references is safe to use.

Comment #17 indicates that clearReferences is a workaround to a different bug, and won't be removed.  Perhaps that should be re-evaluated.  

I now have to disable clearReferences on my tomcat installs on most of my webapps which use log4j to prevent users from seeing a scary looking error being logged by log4j.
Comment 24 Mark Thomas 2008-07-09 14:01:42 UTC
Yes, if you and the libraries you use code their use of loggers correctly you should be safe with setting ENABLE_CLEAR_REFERENCES to false.

I can't speak for the other developers but I am happy with the current default for this being true on the basis it fixes more issues than it causes.

That said, I am more than happy to look into any test case (like bug 42172) that could be a Tomcat memory leak.