Bug 42849 - If mod_jk cannot resolve host name of single worker, all workers are destroyed
Summary: If mod_jk cannot resolve host name of single worker, all workers are destroyed
Alias: None
Product: Tomcat Connectors
Classification: Unclassified
Component: Common (show other bugs)
Version: unspecified
Hardware: Other Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2007-07-10 17:13 UTC by Jerrold Poh
Modified: 2008-10-05 03:10 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Jerrold Poh 2007-07-10 17:13:28 UTC
In the workers definition file (specified by the JkWorkersFile property in the
httpd.conf) there is a list of workers mapped to a list of their respective
hosts to forward to.  I.e. 

worker.list=foo, bar, baz




If one of the hosts which is specified cannot be resolved (lets say in this case
it is baz_host), then the following statement is printed in the mod_jk.log file

ajp_validate::jk_ajp_common.c (2001): worker baz contact is 'baz_host:23013'
ajp_validate::jk_ajp_common.c (2010): can't resolve tomcat address baz_host
ajp_validate::jk_ajp_common.c (2013): invalid host and port baz_host 23013
ajp_destroy::jk_ajp_common.c (2215): up to 0 endpoints to close
wc_create_worker::jk_worker.c (161): validate failed for baz
build_worker_map::jk_worker.c (259): failed to create worker baz
close_workers::jk_worker.c (215): close_workers will destroy worker foo
ajp_destroy::jk_ajp_common.c (2215): up to 1 endpoints to close
close_workers::jk_worker.c (215): close_workers will destroy worker bar
ajp_destroy::jk_ajp_common.c (2215): up to 1 endpoints to close

Which means that all workers are now destroyed instead of just that single
worker (which seems a bit overkill).
Comment 1 Jerrold Poh 2007-07-10 17:14:54 UTC
Forgot to add, version of mod_jk is 1.2.23
Comment 2 Rainer Jung 2007-07-12 07:22:34 UTC
This behaviour is intended. If a name of a target worker is not resolvable 
during startup, we prefer not to startup at all. So the destruction of the 
workers is just a side effect of aborting the Apache startup.

If we could resolve the names once (and finish startup), we'll nor resolve 
again during normal operations.

In our experience logging a failed resolution for a worker during startup and 
continue stratup in many cases will make administrators unaware of the problem.
Comment 3 Tim Whittington 2007-07-15 02:26:27 UTC
I think the point here is that startup isn't aborted (the log excerpt doesn't
show this).
What happens is that after the workers are shut down, Apache startup continues,
leaving mod_jk in an inconsistent state.
A request for a URI that should map to a worker then enters mod_jk, goes through
the URI -> Worker mapping process, and then fails to find the worker identified.

So what is happening is precisely the situation that we're trying to avoid -
starting Apache with a broken configuration.

This is with Apache 2.0.52 on CentOS 4, so it's a pretty standard setup.
Comment 4 Mladen Turk 2007-07-15 02:57:30 UTC
Yes, we should treat that the same way as misconfiguration or
entering invalid directives, because it actually is and refusing
to load the mod_jk in that case.
BTW the Httpd itself won't start if you provide an invalid IP address
or a port already occupied for example, so disabling mod_jk or even entire
httpd is legitimate thing to do.

Comment 5 Mladen Turk 2007-07-15 03:12:25 UTC
After a second thought, mod_jk still needs to load.
The error.log entry and probably console output should be enough.
It should behave in the same way as for example configuring
the wrong path wor the Directory or WirtualHost root.
Log that and continue. Httpd will return 404 in that resource is requested.

So I think we are fine with what we have right now.
The worker(entire balancer) is disabled as well as his mappings.
Comment 6 Rainer Jung 2007-07-16 06:28:07 UTC
Which was the web server, where the behaviour got observed?

Actually we implemented both variants:

Apache 2.x: any validation failure will make wc_open() return JK_FALSE and
Apache will log an error and *not* start.

Apache 1.3: only logs an error, but does start up. Worker initialization even
for the good ones might not be done!

IIS: Not sure.

Netscape: Codes looks like we only log a line, so seems to be the same as for
Apache 1.3

I would prefer to not start up. It is a problem, that can be detected during
startup. As such ir differs from typos in context URLs. Since we can detect the
problem during startup, we should tell people what's wrong and not do the startup.

I think, that Apache 2.x does it OK, we don't have it in Apache 1.3 primarily
for historic reasons, because the init hook does not allow return values. We can
use out jk_error_exit() function nevertheless.

I'll patch it for Apache 1.3 if noone objects. Mladen? Anyone else?
Comment 7 Mladen Turk 2007-07-16 10:20:46 UTC
No objections.

Not sure for IIS if we should or could force the entire service
shutdown. However we can retun 404 from filter if init failed.
Comment 8 Rainer Jung 2007-07-17 05:59:36 UTC
Apache 1.3: fixed in r556916 (don't startup if mod_jk has an initialization error)

IIS: fixed in r556836 (return HTTP status 500 if mod_jk has an initialization error)

Checked for Netscape and Apache httpd 2.x: Both already do not startup in case
of an initialization error.

Changes will be released as part of 1.2.24.
Comment 9 Rainer Jung 2008-01-01 16:32:29 UTC
Move a couple of fixed JK issues from resolved to closed.