This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | GlassFish V3 uses all available CPU during Startup and Deployment | ||
---|---|---|---|
Product: | serverplugins | Reporter: | bht <bht> |
Component: | GlassFish | Assignee: | Vince Kraemer <vkraemer> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | dkonecny, jpleed3, pjiricka, TheJuice |
Priority: | P2 | Keywords: | RELNOTE |
Version: | 6.x | ||
Hardware: | PC | ||
OS: | Windows ME/2000 | ||
Issue Type: | DEFECT | Exception Reporter: | |
Attachments: |
IDE thread dump after 5 minutes of 100% CPU before killing server
Reproducing testcase, NetBEans project in zip file a stand alone version of wicketexamples Server log. I shut down the server after initial deployemnt. Standalone reproducing testcase Glassfish V3 log from Netbeans 6.8 |
Description
bht
2010-02-24 16:07:46 UTC
where is the IDE installed? where is the server installer? also... where is the project at... what type of project one quick 'test' might be to do this... instead of doing project deploy, deploy, deploy... does this issue become less severe if you do project run, run, run... The server is installed by the IDE - my installation is 100% default, the full download, all patched etc. The server is at C:\Program Files\sges-v3. The project is Java 6 Web. It was first a Java EE with a EJB and web module. I thought that complexity was the reason for the crashes. So I simplified and combined the modules. It uses Apache Wicket and has 226 Java classes. I just did this test: Clean server start, module is not deployed. Clean, build, run. CPU usage is terrible, but runs, no errors. Change one java class out of 226 Run Compiling 1 source file to C:\_dt\app\gt1\jprj\G1\G1_W\build\web\WEB-INF\classes compile: compile-jsps: Incrementally deploying <module name> Completed incremental distribution of <module name> Incrementally redeploying <module name> Fine. Change saame java file again. Run Compiling 1 source file to C:\_dt\app\gt1\jprj\G1\G1_W\build\web\WEB-INF\classes compile: compile-jsps: Incrementally deploying <module name> Completed incremental distribution of <module name> Incrementally redeploying <module name> Server uses 100% CPU until it runs out of heap space. I can't wait so long so I kill it after 5 minutes. After killing GF3, when trying to re-start and re-run the app, I get this error: SEVERE: Exception while loading the app org.glassfish.deployment.common.DeploymentException: Error in linking security policy for <module name> -- Inconsistent Module State I have had a lot of patience during the last couple of weeks, thinking that I would find some reason or workaround but no luck. What works best for me is this: - Undeploy module - Stop server - Delete server log files - Start server - Clean and build module - Debug module That way, I can at least hot swap Java classes in debug mode, but no HTML files. So if this is not a known defect already, then I would be ready to pin this down with a testcase as this is really 100% reproducible. It would be gould if you could help me with a strategy, because sometimes these testcases slip away. Also these are possibly two distinct problems. Perhaps I should file another issue for the complete lockup which I described in this comment. THIS issue, from my limited perspective, is only for the high CPU usage as a percentage of the available power of a development computer, ignoring the more severe case for a while. Created attachment 94501 [details]
IDE thread dump after 5 minutes of 100% CPU before killing server
IDE thread dump after 5 minutes of 100% CPU before killing server.
Using jconsole.exe "Java Monitoring and Management Console". GF3 Heap memory consumption of the server jumps from idle 40MB to 500MB on first deployment, then to 1GB on second deployment. GF2 with the same application stays under 60MB in all scenarios. GF2 also never consumes more than 88% CPU peak, 50% average during startup and deployment. The CPU usage, which has a severe impact on NetBeans, is the subject of this bug. The memory consumption is another issue that should get its own bug, but I report it here just in case. I am confident that you can reproduce the 100% CPU issue relatively easily. hmm. I have not been able to reproduce the issue so far... Do you have a stand-alone project that you could attach to this issue that demonstrates the issue? It sounds like your project is using wicket... and the project probably has some jpa in there, too. What do you see in the server log file when you trigger a run-away run action? I can't follow the server log during deployment because the IDE totally freezes, the screen does not refresh and so on. I cannot even switch to the task manager. But I dismantled the whole application step by step, in many generations, and the pattern is as follows: Heap memory usage goes up exponentially with the number of source files involved. The memory cannot be garbage collected. I replaced EclipseLink 2.0 with OpenJPA 2.0 beta, but that did not change anything. Removing wicket or the ejb part of the application did not have any more effect that otherwise reducing the number of source files. I wanted to provide you with a clean testcase so I tried a 3rd party project with a large number of files: The Wicket examples project. No problems there. I guess because this is a Maven project, while my project is a basic NetBeans ant project. Then I ran GF3 with my project still deployed from the command line and watched heap memory usage. Only 3.5MB vs 500-1000 under the IDE. So this looks like a problem that is closer to the NetBeans infrastructure than to the GF3 server itself. At least I don't have to worry about production deployment, only about development with the IDE. So how can we find the bug? I am quite patient with making test cases but I cannot see how the methods of elimination that I am typically using can take me any further in this case. I would certainly need more instrumentation and more knowledge of the IDE internals which I am unlikely to acquire in the short term. May be there are some switches in the IDE. May be we can eliminate server logging to the IDE console or whatever you think could lock it up. Do you think I should convert this project to a Maven project? I will think a bit more, may be I find a better way to spot the bug. Created attachment 94545 [details]
Reproducing testcase, NetBEans project in zip file
The testcase includes a text file with comments how I created it.
(In reply to comment #8) > I can't follow the server log during deployment because the IDE totally > freezes, the screen does not refresh and so on. I cannot even switch to the > task manager. > > The server log is an actual file in <domain-dir>/logs. You can attach that log file to the issue. If you could prune it before you do a test run, that will make it a bit easier to read. Vince, you may want to look at my testcase. Please don't wait for me to provide a server log. You will be able to analyse the server log when running the testcase. I am raising te priority of this because this situation can be reproduced at any time with a Wicket example project as attached. NetBeans becomes unusable with it. bht: I am already trying to create a stand-alone test case from the wicket example that you sent. (it was pretty far from being stand-alone). I will update the status on this issue as I proceed. Please take the time to attach the log file, as requested... please. Created attachment 94571 [details]
a stand alone version of wicketexamples
please try to replicate your issue with this sample.
To do that...
unzip the file someplace.
open the project %someplace%/stand alone/WicketExamples
run
make a change to one of the files (like SignInSession.java)
run
to see if you can replicate the issue, using this sample.
Created attachment 94572 [details]
Server log. I shut down the server after initial deployemnt.
I cannot see anything special in the log. But the server uses all 1GByte available heap space during this deployment. The server was not running when I ran the app.
The attached stand alone version 2010-02-26 13:44 of the testcase does not reproduce the problem. I will iteratively change the reproducing testcase 2010-02-26 04:05 until it is standalone and no longer reprocduces. In that way the bug can hopefully be identified. The biggest difference between the two test cases is the use of libraries vs the use of jar files 'directly'... But that should not come into play with the server... Do you see a significant difference in the size of the deploy directory hierarchy between the two test cases... (%project-root%\build\web) Created attachment 94576 [details]
Standalone reproducing testcase
The attached testcase has only minor differences:
- joda-time is added
- velocity has version 1.4 instead of 1.6.3 (the testcase cannot run properly with 1.6.3)
I don't know why this one reproduces and the other one 94572 doesn't. I can only hope that this one will help identify the bug. I am using JDK 1.6.0_16.
>Do you see a significant difference in the size of the deploy directory
hierarchy between the two test cases... (%project-root%\build\web)
Vince, I can't see it. Not easy to see for me. I don't have a directory wide diff utility handy at the moment. I have no idea whatsoever. I am just following some steps mechanically. There was not a single instance where my testcase started working. The next thing I could do is to modify your testcase so that it breaks at my end.
It appears that beans.xml is the cause of the trouble. My own project runs fine too if I remove that file. Vince, thanks very much for your great support and professionalism. OK. That is a huge find. This info is what I need to really start to make progress on an issue like this. I had 'commented out' beans.xml because of some warnings that I saw during deployment. I should have more info in a couple days. I am glad to read this was the one. I appreciate that this type of work is very stressful for the NetBeans team with so little actionable information available until now. In light of this, the availability of the Wicket examples testcase made the breakthrough possible. I must admit that I had problems running these examples initially due to lack of knowledge, but the value of this 3rd party application for the purpose of communication is enormous. Speaking of tescases, Vince, would you know where to find a sample application that demonstrates the new GlassFish V3 saved sessions across redeployments? I have so many things to do that I should ask this question before spending time on re-inventing the wheel. I was able to reproduce the behavior with NB out of the picture after the war file is built. I opened https://glassfish.dev.java.net/issues/show_bug.cgi?id=11631 to track the problem in the GF side. I have to wonder if there is some interference between CDI (beans.xml) and some other component in the war file that triggers the issue. Finally, I find an issue that sounds like what I'm having! When GF v3 came out, I liked CDI so much I refactored all my apps to use it. However, when I started smoke testing them on v3 I noticed that in Windows task manager, the mem size would grow about 30 MB whenever I redeployed an app. Eventually when the mem size reached about 650-700MB, deployments would stall and I would eventually get an out of heap space error. When I would restart the server, mem usage would still be high. Only after deleting the osgi-cache and generated folders out of the domain folder would mem usage return to a manageable level. Then I started running into problems with the CPU running at full throttle during deployment and locking up the entire OS. I worked around this by setting the JVM's process to only use one processor core. I've been working this way for weeks, waiting until I got everything ready to go out on my staging server before looking for answers. I have that machine set to the server JVM with a larger heap size (1400 MB). I figured something might change with not deploying as often, but when I have all my apps on there and leave it run for a few hours, pages start serving sluggishly then not at all. The same apps with mostly similar J5EE code ran fine on GF v2.1 with a MUCH smaller memory footprint. Last week I switched over to Weld 1.0.1-final by patching the GF Weld integration module to use the SPI extensions instead of the implementation. There's still no change. I'm going to guess that if this problem is Weld related, then the problem lies in the integration module. It seems most of the work and testing they do is on JBoss 6.0.0-M2. One thing's for certain for me: there's no way I can take this into production. jpleed3, you may want top add this info into the GF issue. Already done. I've been trying to get a bead on this one all day! the app triggers a stack overflow on my solaris box. (In reply to comment #26) > the app triggers a stack overflow on my solaris box. more correctly... deploying the app triggers a.... Hello, I comfirm this behavior. I managed to reproduce it on Windows XP 32bit with 4GB of RAM and on Windows 7 64bit with 6GB of RAM. After few depoys, java.exe is grown up to 600MB, and CPU is 100% all the time. My Netbeans log is attached. Created attachment 94887 [details]
Glassfish V3 log from Netbeans 6.8
v3u1 2010-03-10 appears to have resolved https://glassfish.dev.java.net/issues/show_bug.cgi?id=11631 BUT it looks like the server is still going to cause usage problems because of https://glassfish.dev.java.net/issues/show_bug.cgi?id=11668 I hope the cause can be found in time for 6.9... This is still the same trap as on day one. If the problem can't be fixed then still there needs to be a contingency solution. I am proposing that in the Web New Project Wizard, the check box "Enable Context and Dependency Injection" (CDI) is grayed out or removed, or at least that the default is changed to disabled. Also there must be a check in the build process that detects the beans.xml and prints a warning. As we know, this is highly toxic stuff, and it has cost man weeks to get to the current state that only lets us avoid the problem by not using CDI. Without any such precautions, such efforts will be repeated unnecesarily as people upgrade their existing systems whether they are using CDI or not. So will GlassFish bug 11668 be resolved by 3.0.1 final? I hope so, but if not, then disabling CDI by default may make sense. (In reply to comment #33) > So will GlassFish bug 11668 be resolved by 3.0.1 final? I hope so, but if not, > then disabling CDI by default may make sense. I have grave doubts about the GF issue getting fully resolved... The folks that work for redhat may have little interest in resolving this, since it works for their server. Removed the simplefix keyword... It was not being used correctly. *** Bug 188454 has been marked as a duplicate of this bug. *** I cannot replicate this with NB 7.0 dev builds and GF 3.1 build 24, using the testcase http://netbeans.org/bugzilla/attachment.cgi?id=94576 on my Windows 7 box. Obviously, if you are still running into this with an updated environment, please re-open and provide details. see previous comment GlassFish bug https://glassfish.dev.java.net/issues/show_bug.cgi?id=11668 is now fixed; whoever ran into this may want to try the latest GlassFish 3.1 daily build. Thank you very much for all of this. I have been using this as a showcase of the committment to quality of the NetBeans team. |