Hi Mladen, We migrate from JK2 to JK1 in the moment. We encountered some problems on bigger Webfarms with the JK1 / IIS6 in full Application Isolation mode. The main blocker is, when a Application pool is run in multithreading configuration (App-Pool-> Properties->Performance-> Web-Garden-> Number of Worker Processes), the JK does not start right: (Loglevel Info) [Thu Nov 02 16:35:53 2006] [5872:5924] [error] validate::jk_lb_worker.c (1104): allocating worker record from shared memory [Thu Nov 02 16:35:53 2006] [5872:5924] [error] wc_create_worker::jk_worker.c (158): validate failed for P-NG-TC3Hx [Thu Nov 02 16:35:53 2006] [5872:5924] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Thu Nov 02 16:35:54 2006] [4012:6612] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Thu Nov 02 16:35:54 2006] [4012:6612] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Thu Nov 02 16:35:57 2006] [1672:6912] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Thu Nov 02 16:35:57 2006] [1672:6912] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Thu Nov 02 16:36:27 2006] [6100:3092] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Thu Nov 02 16:36:27 2006] [6100:3092] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx It runs fine with a singler worker process. I also tried to reduce "connection_pool_size" to 1 (from 64) because there are known probs with Apache prefork, but it didn't help. This is a real production blocker. I also realized that in the jkstatus, the live-config and stati apply only to the application pool this web is in. If i reconfigure a jk in a web in a different pool, it is totally independent. As I understand it, all jk- instances on one server should share this information through a shared memory area. This doesn't seem to work either.With that bug, one can't use the nice feature to reconfigure the jk online on IIS6 with multiple application pools. If you need config files or debug logs, please ask. These are from production environment, so I'd prefere to send them personally.
You are correct. Will you be willing to test the patched binary? Also, what platfom it is. WIN32 or WIN64?
Yes, we can test the binaries. Thank you for the offer. It happens on both, the 32 an 64 bit plattforms. in the moment, a 64 bit webfarm produces the errors. But we also got 32 bit systems ready to test.
Hi again, hete to say it, but we found another severe bug in the full Application Isolation mode of IIS6 with jk1.2.x. Application pools recycle themself after some time or number of requests. When this happens: Event Type: Information Event Source: W3SVC Event Category: None Event ID: 1074 Date: 05.11.2006 Time: 02:41:15 User: N/A Computer: xxxxxxxxxxxxx Description: A worker process with process id of '1380' serving application pool 'www.premiere.de' has requested a recycle because the worker process reached its allowed processing time limit. ...the JK exits with an error: [Fri Nov 03 22:28:09 2006] [1380:6848] [info] service::jk_lb_worker.c (873): unrecoverable error 413, request failed. Client failed in the middle of request, we can't recover to another instance. [Sun Nov 05 02:41:20 2006] [2044:3792] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Sun Nov 05 02:41:20 2006] [2044:3792] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Sun Nov 05 02:41:20 2006] [6872:4756] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Sun Nov 05 02:41:20 2006] [6872:4756] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Sun Nov 05 02:41:32 2006] [6568:3420] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Sun Nov 05 02:41:32 2006] [6568:3420] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx [Sun Nov 05 02:42:04 2006] [1360:4448] [error] wc_create_worker::jk_worker.c (146): factory for lb failed for P-NG-TC3Hx [Sun Nov 05 02:42:04 2006] [1360:4448] [error] build_worker_map::jk_worker.c (256): failed to create worker P-NG-TC3Hx JK2 had no problem with this. Maybe this had also to do with the shared mem problem, because the "main" jk1 ISAPI-extension is configured globaly in the IIS, and the called JK1-instances are run in the Application Pool environment. Can you tell me roughly when a testversion will be available? It matters for our Testsystems to be held in this state or, if it takes longer, to be re- setup later.
Hi, Yes the problem is with shared memory. It is initialized as global while it should be initialized on per virtual server basis. I'll take a look at that this week.
Created attachment 19133 [details] WIN64/AMD64 ISAPI Redirect isapi_redirect 1.2.20 with shared memory patch
Thank you for the fix. I did a quick test on the AMD64 server and it looks really good. Multithread in an applicationpool and app pool recyling seems to work fine. A few tests later, only a few small flaws appeared. First, you get an (one!) error when die app-pool recycles. This is easy reproduceable. Recycle every 100 requests, you'll get one error like these every 100 requests. Sometimes 2 per cycle at the same time. [Wed Nov 15 15:32:34 2006] [5536:4756] [error] HttpExtensionProc::jk_isapi_plugin.c (1102): could not get a worker for name P- NG-TC3Hx [Wed Nov 15 15:32:54 2006] [5568:6800] [error] HttpExtensionProc::jk_isapi_plugin.c (1102): could not get a worker for name P- NG-TC3Hx Also, rarely, a "WriteClient failed" appears. This didn't happen before on the JK2. But I'm not totally sure if this applies to this jk-version as origin. [Wed Nov 15 17:16:48 2006] [0540:6520] [error] write::jk_isapi_plugin.c (656): WriteClient failed with 00002746 [Wed Nov 15 17:16:48 2006] [0540:6520] [info] ajp_process_callback::jk_ajp_common.c (1421): Writing to client aborted or client network problems [Wed Nov 15 17:16:48 2006] [0540:6520] [info] ajp_service::jk_ajp_common.c (1813): (P-NG-TC3H1) request failed, because of client write error without recovery in send loop attempt=0 [Wed Nov 15 17:16:48 2006] [0540:6520] [info] service::jk_lb_worker.c (874): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance. And still in the jkstatus, the config and stati apply only to this app pool or even web / virtual server. I guess this is by design, because you mentioned to change the shared mem to a per-vhost basis. But it is still not very handy. Say you want to drain a tomcat to do maintainance, on our farms you'd need to do it not only on 11 frontent webservers but also on 41 webs per server. That's not practicable. Hopefully in the future, there is a per-vhost an global section in jkstatus. But this would need globally initialized shared mem again :) Also tested was the "reference" directive, which didn't work in the 1.2.19 on IIS. That also worked fine. All in all, great work, thanks. If you can fix the last on-recycle error I'd say test passed for amd64. Any idea when the final 1.2.20 will be released?
(In reply to comment #5) > Created an attachment (id=19133) [edit] > WIN64/AMD64 ISAPI Redirect > > isapi_redirect 1.2.20 with shared memory patch Hi Mladen, We are having similar sort of issue on Win32 Platform. I will explain in detail. Environment We are running MS Application Center Load Balanced Web Server Farm with 3 nodes running IIS 6 in Worker Process Isolation mode. The Tomcat/JBoss runs on two separate Servers (again Load Balanced via Application Center). The Web Server Farm is running 5 Websites. The JK ISAPI redirector version is 1.2.19 and is configured on per site basis. Problem Everything works fine except when the application pool undergoes recycling, the website using that application pool wouldn't work. Recycling application pool again, restarting website wouldn't help. The request to that website gets 404.2 and 404.3 in IIS logs. The 404.2 indicates that JK ISAPI is locked or access is denied to the JK ISAPI. The only way to get it working again is to do an IISRESET which affects all other websites and disconnects all the users connected to those websites. Its very frustrating as the whole purpose of IIS 6 running in worker process isolation mode is defeated. The JK Logs shows the following [Wed Nov 15 17:25:41 2006] [1228:4924] [error] jk_worker.c (256): failed to create worker lbworker [Wed Nov 15 17:49:24 2006] [2332:2232] [error] jk_worker.c (146): factory for lb failed for lbworker [Wed Nov 15 17:49:24 2006] [2332:2232] [error] jk_worker.c (256): failed to create worker lbworker [Wed Nov 15 18:31:12 2006] [4756:4804] [error] jk_worker.c (146): factory for lb failed for lbworker [Wed Nov 15 18:31:12 2006] [4756:4804] [error] jk_worker.c (256): failed to create worker lbworker [Wed Nov 15 18:57:02 2006] [4044:6564] [error] jk_worker.c (146): factory for lb failed for lbworker [Wed Nov 15 18:57:02 2006] [4044:6564] [error] jk_worker.c (256): failed to create worker lbworker [Wed Nov 15 18:57:24 2006] [0324:6956] [error] jk_worker.c (146): factory for lb failed for lbworker [Wed Nov 15 18:57:24 2006] [0324:6956] [error] jk_worker.c (256): failed to create worker lbworker [Wed Nov 15 18:57:32 2006] [3660:5948] [error] jk_worker.c (146): factory for lb failed for lbworker [Wed Nov 15 18:57:32 2006] [3660:5948] [error] jk_worker.c (256): failed to create worker lbworker The windows eventlog would show the following: Event Type: Information Event Source: W3SVC Event Category:None Event ID: 1074 Date: 15.11.2006 Time: 06:56:36 User: N/A Computer: xxxxxxxxxxxxx Description: A worker process with process id of '6308' serving application pool 'xxxxx' has requested a recycle because the worker process reached its allowed processing time limit. We didn't have any issues with the earlier version of JK which was 1.2.08. I have a strong feeling that this has to do with the shared memory problem. Please let me know if you need any further information, config files or log files.
Hi Om Bhakar. Yes, the problem you describe has been acknowledged by Mladen, he in working on it and already released a patched version for 64bit systems, as you can read above. The test looked good, so I guess it's only a matter of days until a patched version for you 32 bit environment is supplied. So we should better leave Mladen to do his good work. Hi Mladen, I'm pretty sure now that the "WriteClient failed" problem has nothing to do with your patch. After a few reconfigurations and tomcat restarts i managed a 1h loadtest with several 100000 requests where only the expected ammount of "could not get a worker for name ..." recycle-errors occoured.
Hi Bjoern Andersen, The problem is very intermittent and we can't replicate it when we want to. It would happen may be once a day, not necessarily but most probably around the time when the application pool recycles itself. One website using JK stops working and no matter what we do (except IISRESET), it wouldn't come up throwing 404.2 and 404.3 errors. I don't have any idea as when the patch would be released. I am more than willing to test it as soon as it is released. In the mean time while Mladen is doing the good work to release patch for JK 1.2.19 for win32, What I am thinking is to roll back to JK 1.2.15 since this bug is affecting our production environment. Do you know by any chance if the version JK 1.2.15 is more stable and doesn't have the shared memory bug. What version of JK would you suggest to roll back to?
Hi Om Bhakar, Sorry, no idea. We are switching from jk2.0.4, and tested only 1.2.18 & 19. Both had the recycle problem. Hi Mladen, Can you give a quick insight into the release plans? Is there going to be a 32bit patch version or is 1.2.20 so close that you want to wait for that? When can we expect the 32bit version?
I don't know, if Mladen still want to do something for the Win world, but the changes on my side will be finished latest next weekend. So I will suggest today on the list to cut the release sometime very close to the next weekend, and if we did our job good, it will take at most another week before it's final. After tagging we usually wait a few days for community testing and then we call a three day vote on stability.
Created attachment 19156 [details] WIN32 ISAPI Redirect WIN32 ISAPI Redirect 1.2.20 dev version
Hi Mladen & Team, The newer version of JK Redirector 1.2.20 is performing very well and doing a fantastic job as we didn't have any issues so far after deploying the newer version 2 weeks back. I would like to thank you and the whole Tomcat Jakarta Redirector Development/Test Team for doing such an excellent job. Thanks.