I have a thread that is hung and is taking up 100% of one of my CPUs. I'm running Java 1.6.0.3. Since it is blocked in HashMap, which is not synchronized, it doesn't appear to be an issue with our code. jconsole can't detect any deadlocks. I assume it is this thread as the ThreadUserTime is massive, 359,995,020,000,000, compared to other SocketListener's. Here is the stack trace: Name: SocketListener0-12 State: RUNNABLE Total blocked: 15 Total waited: 127 Stack trace: java.util.HashMap.get(HashMap.java:303) org.apache.lenya.transaction.IdentityMapImpl.get(IdentityMapImpl.java:54) org.apache.lenya.cms.repository.SharedItemStoreImpl.getRepositoryItem(SharedItemStoreImpl.java:65) org.apache.lenya.cms.publication.DocumentFactoryImpl.get(DocumentFactoryImpl.java:91) org.apache.lenya.cms.publication.DocumentFactoryImpl.get(DocumentFactoryImpl.java:73) org.apache.lenya.cms.site.AbstractLink.getDocument(AbstractLink.java:107) org.apache.lenya.cms.site.tree2.SiteTreeImpl.getKey(SiteTreeImpl.java:304) org.apache.lenya.cms.site.tree2.SiteTreeImpl.linkAdded(SiteTreeImpl.java:300) org.apache.lenya.cms.site.tree2.TreeNodeImpl.addLinkInternal(TreeNodeImpl.java:211) org.apache.lenya.cms.site.tree2.TreeNodeImpl.addLink(TreeNodeImpl.java:197) org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadLinks(SiteTreeImpl.java:168) org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadNodes(SiteTreeImpl.java:158) org.apache.lenya.cms.site.tree2.SiteTreeImpl.loadNodes(SiteTreeImpl.java:159) org.apache.lenya.cms.site.tree2.SiteTreeImpl.load(SiteTreeImpl.java:115) - locked org.apache.lenya.cms.site.tree2.SiteTreeImpl@ce9197 org.apache.lenya.cms.site.tree2.SiteTreeImpl.contains(SiteTreeImpl.java:334) org.apache.lenya.cms.site.tree2.DelegatingSiteTree.contains(DelegatingSiteTree.java:79) org.apache.lenya.cms.publication.DefaultDocumentBuilder.isDocument(DefaultDocumentBuilder.java:111) sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.avalon.excalibur.component.ComponentProxyGenerator$ComponentInvocationHandler.invoke(ComponentProxyGenerator.java:143) $Proxy33.isDocument(Unknown Source) org.apache.lenya.cms.publication.DocumentFactoryImpl.isDocument(DocumentFactoryImpl.java:189) org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicyUrlCorrect(DocumentPolicyManagerWrapper.java:119) org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicyURL(DocumentPolicyManagerWrapper.java:80) org.apache.lenya.cms.ac.DocumentPolicyManagerWrapper.getPolicy(DocumentPolicyManagerWrapper.java:240) org.apache.lenya.cms.cocoon.acting.SslRedirectAction.act(SslRedirectAction.java:75) org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTypeNode.java:125) org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:47) org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:108) org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:69) org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143) org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:69) org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:93) org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:235) org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:177) org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:254) org.apache.cocoon.Cocoon.process(Cocoon.java:699) org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1154) javax.servlet.http.HttpServlet.service(HttpServlet.java:689) org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) org.mortbay.http.HttpContext.handle(HttpContext.java:1565) org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) org.mortbay.http.HttpContext.handle(HttpContext.java:1517) org.mortbay.jetty.plus.PlusWebAppContext.handle(PlusWebAppContext.java:158) org.mortbay.http.HttpServer.service(HttpServer.java:954) org.mortbay.http.HttpConnection.service(HttpConnection.java:814) org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
I think the problem is due to the unsynchronized nature of the HashMap. I found this bug for Java 1.4.2 which states that used in an unsynchronized way, bad things can happen. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457
The one place where I have seen issues in IdentityMapImpl has been fixed in r630984. Leaving bug open until I can confirm no more trouble from this issue.
A similar problem with Lenya-1.2 was crashing our production server. Server typically runs less than 1% usage. Pairs of java processes appear using (combined) 90% of one processor and only disappear when Lenya is restarted. The server slows with 2 pairs and stalls with 3 pairs. This forced an upgrade last August, a User ML post on 20071129 (no responses), and was finally patched with a cron job to restart Lenya when the symptoms appear. Problem appears intermittently with no identifiable cause. jconsole is not available for troubleshooting Java 1.4. Linux, Apache-httpd-2.1 as proxy, two processors Java 1.4.2_08 Lenya 1.2.2, Cocoon 2.1.6 Lenya 1.2.5, Cocoon 2.1.10. Upgrade from 1.2.2 to 1.2.5 did not cure problem. Could this be fixed by synchronizing all uses of HashMap in the 1.2.x branch? Lenya-1.2.x references 127 HashMaps with 19 Collections.synchonizedMaps. Cocoon-2.1.11 src/java references 313 HashMaps. The ubiquitous objectModel declared in Cocoon-2.1's AbstractEnvironment is not synchronized or unmodifiable. Is globally replacing "= new HashMap();" with "= Collections.synchronizedMap(new HashMap());" enough or bad? (A script would also need to add "import java.util.Collections;" where missing and needed.) Is there a better solution?
I'm not sure. There are a great many more references to HashMaps in the 2.0 base than the one I fixed. This was the only one I saw giving me trouble, but only time will tell. The other alternative would be to change all HashMaps to Hashtables. I am not sure of the functional differences between the two objects, except that Hashtables are synchronized and don't accept nulls. I thought for the moment that synchronizing the HashMap was the best way to go. HashMaps should be quicker in a read environment, but apparently they don't always work amongst several threads. These problems definitely seem to be the same thing. 2.0 code base: 86 classes with HashMaps 1 class with Collections.synchronizedMap 1 class with Hashtable
I don't think that we have to synchronize all collections. Synchronization creates a performance overhead, that's why the new collection classes (ArrayList, HashMap) aren't synchronized in contrast to the "old" ones (Vector, Hashtable). The IdentityMapImpl is a special case because the global identity map is attached to the session and can therefore be accessed by multiple threads.
Okay. Do you know what components in 1.2 would be causing trouble? I can watch my usage through jconsole for the next few weeks. I had been restarting Lenya every morning due to this problem. I put the fix in place this morning, so hopefully I'll know for 2.0 by the end of this week.
This fixed the issue I was having. It looks like 1.2.5 still has a problem, so I've assigned it to that version now.