Bug 44059

Summary: Unsychronized HashMaps being used improperly leads to horrible performance
Product: Lenya Reporter: Richard Frovarp <rfrovarp>
Component: MiscellaneousAssignee: Lenya Developers <dev>
Status: NEW ---    
Severity: major CC: taffy-tyler6464
Priority: P2    
Version: 1.2.5   
Target Milestone: 1.2.7   
Hardware: Other   
OS: other   

Description Richard Frovarp 2007-12-11 12:25:34 UTC
I have a thread that is hung and is taking up 100% of one of my CPUs. I'm
running Java 1.6.0.3. Since it is blocked in HashMap, which is not synchronized,
it doesn't appear to be an issue with our code. jconsole can't detect any
deadlocks. I assume it is this thread as the ThreadUserTime is massive,
359,995,020,000,000, compared to other SocketListener's.

Here is the stack trace:

Name: SocketListener0-12
State: RUNNABLE
Total blocked: 15  Total waited: 127

Stack trace: 
java.util.HashMap.get(HashMap.java:303)
org.apache.lenya.transaction.IdentityMapImpl.get(IdentityMapImpl.java:54)
org.apache.lenya.cms.repository.SharedItemStoreImpl.getRepositoryItem(SharedItemStoreImpl.java:65)
org.apache.lenya.cms.publication.DocumentFactoryImpl.get(DocumentFactoryImpl.java:91)
org.apache.lenya.cms.publication.DocumentFactoryImpl.get(DocumentFactoryImpl.java:73)
org.apache.lenya.cms.site.AbstractLink.getDocument(AbstractLink.java:107)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.getKey(SiteTreeImpl.java:304)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.linkAdded(SiteTreeImpl.java:300)
org.apache.lenya.cms.site.tree2.TreeNodeImpl.addLinkInternal(TreeNodeImpl.java:211)
org.apache.lenya.cms.site.tree2.TreeNodeImpl.addLink(TreeNodeImpl.java:197)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadLinks(SiteTreeImpl.java:168)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadNodes(SiteTreeImpl.java:158)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadNodes(SiteTreeImpl.java:159)
org.apache.lenya.cms.site.tree2.SiteTreeImpl.load(SiteTreeImpl.java:115)
   - locked org.apache.lenya.cms.site.tree2.SiteTreeImpl@ce9197
org.apache.lenya.cms.site.tree2.SiteTreeImpl.contains(SiteTreeImpl.java:334)
org.apache.lenya.cms.site.tree2.DelegatingSiteTree.contains(DelegatingSiteTree.java:79)
org.apache.lenya.cms.publication.DefaultDocumentBuilder.isDocument(DefaultDocumentBuilder.java:111)
sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.apache.avalon.excalibur.component.ComponentProxyGenerator$ComponentInvocationHandler.invoke(ComponentProxyGenerator.java:143)
$Proxy33.isDocument(Unknown Source)
org.apache.lenya.cms.publication.DocumentFactoryImpl.isDocument(DocumentFactoryImpl.java:189)
org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicyUrlCorrect(DocumentPolicyManagerWrapper.java:119)
org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicyURL(DocumentPolicyManagerWrapper.java:80)
org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicy(DocumentPolicyManagerWrapper.java:240)
org.apache.lenya.cms.cocoon.acting.SslRedirectAction.act(SslRedirectAction.java:75)
org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTypeNode.java:125)
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:47)
org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:108)
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:69)
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:69)
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:93)
org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:235)
org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:177)
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:254)
org.apache.cocoon.Cocoon.process(Cocoon.java:699)
org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1154)
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
org.mortbay.jetty.plus.PlusWebAppContext.handle(PlusWebAppContext.java:158)
org.mortbay.http.HttpServer.service(HttpServer.java:954)
org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Comment 1 Richard Frovarp 2008-02-25 12:31:58 UTC
I think the problem is due to the unsynchronized nature of the HashMap. I found
this bug for Java 1.4.2 which states that used in an unsynchronized way, bad
things can happen. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457
Comment 2 Richard Frovarp 2008-02-25 12:39:05 UTC
The one place where I have seen issues in IdentityMapImpl has been fixed in
r630984. Leaving bug open until I can confirm no more trouble from this issue.
Comment 3 solprovider 2008-02-25 20:03:48 UTC
A similar problem with Lenya-1.2 was crashing our production server.  Server
typically runs less than 1% usage.  Pairs of java processes appear using
(combined) 90% of one processor and only disappear when Lenya is restarted.  The
server slows with 2 pairs and stalls with 3 pairs.  This forced an upgrade last
August, a User ML post on 20071129 (no responses), and was finally patched with
a cron job to restart Lenya when the symptoms appear. Problem appears
intermittently with no identifiable cause.  jconsole is not available for
troubleshooting Java 1.4.

Linux, Apache-httpd-2.1 as proxy, two processors
Java 1.4.2_08
Lenya 1.2.2, Cocoon 2.1.6
Lenya 1.2.5, Cocoon 2.1.10.  Upgrade from 1.2.2 to 1.2.5 did not cure problem.

Could this be fixed by synchronizing all uses of HashMap in the 1.2.x branch? 

Lenya-1.2.x references 127 HashMaps with 19 Collections.synchonizedMaps.
Cocoon-2.1.11 src/java references 313 HashMaps.  The ubiquitous objectModel
declared in Cocoon-2.1's AbstractEnvironment is not synchronized or
unmodifiable.  Is globally replacing "= new HashMap();" with "=
Collections.synchronizedMap(new HashMap());" enough or bad?  (A script would
also need to add "import java.util.Collections;" where missing and needed.)

Is there a better solution?
Comment 4 Richard Frovarp 2008-02-25 20:42:15 UTC
I'm not sure. There are a great many more references to HashMaps in the 2.0 base
than the one I fixed. This was the only one I saw giving me trouble, but only
time will tell. The other alternative would be to change all HashMaps to
Hashtables. I am not sure of the functional differences between the two objects,
except that Hashtables are synchronized and don't accept nulls. I thought for
the moment that synchronizing the HashMap was the best way to go. HashMaps
should be quicker in a read environment, but apparently they don't always work
amongst several threads. 

These problems definitely seem to be the same thing. 2.0 code base:
86 classes with HashMaps
1 class with Collections.synchronizedMap
1 class with Hashtable
Comment 5 Andreas Hartmann 2008-02-26 00:39:36 UTC
I don't think that we have to synchronize all collections. Synchronization
creates a performance overhead, that's why the new collection classes
(ArrayList, HashMap) aren't synchronized in contrast to the "old" ones (Vector,
Hashtable).

The IdentityMapImpl is a special case because the global identity map is
attached to the session and can therefore be accessed by multiple threads.
Comment 6 Richard Frovarp 2008-02-26 07:28:52 UTC
Okay. Do you know what components in 1.2 would be causing trouble? 

I can watch my usage through jconsole for the next few weeks. I had been
restarting Lenya every morning due to this problem. I put the fix in place this
morning, so hopefully I'll know for 2.0 by the end of this week.
Comment 7 Richard Frovarp 2008-03-12 16:53:27 UTC
This fixed the issue I was having. It looks like 1.2.5 still has a problem, so I've assigned it to that version now.