Bug 50213

Summary: Category callAppenders synchronization causes java.lang.Thread.State: BLOCKED - synchronization can be avoided by using immutable AppenderAttachableImpl
Product: Log4j - Now in Jira Reporter: Dave Boden <dave>
Component: OtherAssignee: log4j-dev <log4j-dev>
Status: NEW ---    
Severity: normal CC: dan.rabe, dcherk71
Priority: P2    
Version: 1.2   
Target Milestone: ---   
Hardware: HP   
OS: Linux   
Attachments: Patch to remove synchronization from Category but still keep thread safety
My thread dump with exact same problem, I think

Description Dave Boden 2010-11-04 10:50:49 UTC
Using log4j 1.2.16 on Linux.

When logging from 10 threads almost simultaneously, I'm seeing intermittent deadlocking. Doing a kill -3 on Linux shows the thread dump and shows java.lang.Thread.State: BLOCKED -  waiting to lock <0xd42e9a78> (a org.apache.log4j.spi.RootLogger)

Looking at the Category code, I can see how threads could potentially deadlock:

Method: callAppenders(LoggingEvent event)
...

    for(Category c = this; c != null; c=c.parent) {
      // Protected against simultaneous call to addAppender, removeAppender,...
      synchronized(c) {
	if(c.aai != null) {
	  writes += c.aai.appendLoopOnAppenders(event);
	}
	if(!c.additive) {
	  break;
	}
      }
...

However, before thinking too hard about exactly the scenario can occur I'd rather just remove the need to synchronize altogether when looping through the appenders attached to a Category.

Appenders are themselves thread safe due to AppenderSkeleton's doAppend() method being synchronized. So, the only reason why Category synchronizes when iterating through appenders is, as the comments say, to:
// Protected against simultaneous call to addAppender, removeAppender,...

This is only required because AppenderAttachableImpl aai is a mutable object which allows appenders to be added or removed at any time. By making it immutable (a copy-on-write list) we can avoid the synchronization at the most contentious point in log4j, the point at which the logging occurs. I'm sure we'll agree that adding and removing appenders is extremely rare compared to writing logs. So, if we can remove the synchronization at this point then we should see a significant benefit with very little performance hit for the copy-on-write of the AppenderAttachableImpl.

I'll produce a patch for review.


Here's an extract of the kill -3 dump:

"message.receiver-1" prio=10 tid=0xcf2d4400 nid=0x6501 waiting for monitor entry [0xcc12e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.log4j.Category.callAppenders(Category.java:205)
        - waiting to lock <0xd42e9a78> (a org.apache.log4j.spi.RootLogger)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.info(Category.java:666)


"org.springframework.jms.listener.DefaultMessageListenerContainer#1-1" prio=10 tid=0xcf0e6800 nid=0x6507 waiting for monitor entry [0xcabc6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.log4j.Category.callAppenders(Category.java:205)
        - waiting to lock <0xd42e9a78> (a org.apache.log4j.spi.RootLogger)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.log(Category.java:856)
        at org.apache.commons.logging.impl.Log4JLogger.trace(Log4JLogger.java:146)


LOTS MORE OF THESE...
Comment 1 Dave Boden 2010-11-04 13:04:42 UTC
Created attachment 26254 [details]
Patch to remove synchronization from Category but still keep thread safety
Comment 2 Curt Arnold 2010-11-06 10:59:27 UTC
The deadlock that you are experiencing isn't likely due to the lock on category.  There are likely two other locks in conflict that result in cascade of blocks.  Could you attach a full dump to see if we can pick out the blocks that start the problem.

I am assuming that you've used you modified version of log4j and you saw an improved experience.  It might possibly isolate the threads where you might still have the initiating deadlock but it doesn't spread to any other thread that logs on the same logger which would be a very beneficial change.

Changing the result type of Category.removeAppender(Appender) and removeAppender(String) breaks compatibility with any compiled app that calls either method as they will now could throw a NoSuchMethodException, see http://java.sun.com/docs/books/jls/second_edition/html/binaryComp.doc.html#45208.  Also, as you saw with NOPLogger, it requires any extended Logger (discouraged, but not prevented) to be modified.

Replacing the lazy construction of AppenderAttachableImpl might have some, but likely small, performance impact.  It likely was a big deal in the JDK 1.1 days when object creation was substantially more expensive.  The member is package visible (no explicit public, private or protected) so the only way that it could become null again is if something in org.apache.log4j set it null and scanning the rest of the source shows no place that that could occur.

Touching synchronization in Category is a minefield.  It could be safer to approach this by introducing a new class extended from Logger that is inherently thread-safe and then allowing the user to configure to use the corresponding logger factory when they want the new behavior.
Comment 3 Dave Cherkassky 2011-03-16 22:40:41 UTC
Created attachment 26778 [details]
My thread dump with exact same problem, I think

I seem to have the exact same deadlock.  Since comment 2 asked for a thread dump, I am attaching mine, perhaps it will help you.
Comment 4 Dave Cherkassky 2011-03-16 22:45:38 UTC
(In reply to comment #3)
> Created an attachment (id=26778) [details]
> My thread dump with exact same problem, I think
> 
> I seem to have the exact same deadlock.  Since comment 2 asked for a thread
> dump, I am attaching mine, perhaps it will help you.

BTW, I am using (from the MANIFEST file):

---------------
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: Apache Maven
Built-By: ubuntu
Build-Jdk: 1.6.0

Name: org.apache.log4j
Implementation-Title: log4j
Implementation-Vendor: "Apache Software Foundation"
Implementation-Version: 1.2.15
---------------
Comment 5 Dave Cherkassky 2011-03-17 00:11:04 UTC
(In reply to comment #3)
> Created an attachment (id=26778) [details]
> My thread dump with exact same problem, I think
> 
> I seem to have the exact same deadlock.  Since comment 2 asked for a thread
> dump, I am attaching mine, perhaps it will help you.

Ok, we've analysed this and realized that the problem lies with the com.djinnsoft.jade.httputil.AuthorizedSMTPAppender class.  

This was a custom-built class that we created a while ago before log4j's SMTP appender supported authentication.  Our client misconfigured the log4j.properties file, and the AuthorizedSMTPAppender started blocking when it could not reach an email server.  Obviously, AuthorizedSMTPAppender is poorly written -- it was meant as a stop-gap.  Also, obviously, it shouldn't even be in use since the latest log4j version supports this functionality.

So, the obvious solution (for us) is to correct the log4j configuration.


Having said that, it might still be worth while for you guys to make the callAppenders() method more bullet-proof.  I guess it's up to you :)


Best of luck,
-- 
Dave Cherkassky
  VP of Software Development
  DJiNN Software Inc.
  416.504.1354
Comment 6 Bartek Kowalewski 2011-04-04 06:28:51 UTC
Hi guys,

When doing performance/load tests of my system, I've been observing plenty of threads waiting on this synchronization - on org.apache.log4j.Category.callAppenders(). For sure a deadlock is not possible and, what is more, this suboptimal synchronization mechanism cannot cause HUGE performance degradation if async appenders are used underneath. However, in a really multithreaded env it'll cause some delays.

It should not be hard to improve this synchronization mechanism without the need to change API. Introduce a read-write lock? 

Log4j 1.2.x is compliant with JDK 1.2, right? If so, using java.util.concurrent.locks classes is not an option. Maybe a simple custom impl of read-write lock would be enough?

Best regards,
  Bartek
Comment 7 Bartek Kowalewski 2011-04-11 05:52:25 UTC
I've just created https://issues.apache.org/bugzilla/attachment.cgi?bugid=51047 (Move org.apache.log4j.Category to reentrant read/write locks). This change should get rid of issues with synchronization and deadlocks.