Bug 18500 - Host aliases to match by regular expression
Summary: Host aliases to match by regular expression
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: unspecified
Hardware: Other other
: P3 enhancement with 8 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 7676 51269 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-03-30 17:20 UTC by Aaron Hamid
Modified: 2016-08-01 09:54 UTC (History)
6 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Hamid 2003-03-30 17:20:09 UTC
I would like hosts to match aliases by regular expression instead of exact 
match.  This can be done trivially with the following patch under 1.4 (or under 
previous VMs by using Jakarta Regexp):

--- StandardEngineMapper.java.bak       2003-03-19 10:18:38.000000000 -0500
+++ StandardEngineMapper.java   2003-03-30 12:13:13.000000000 -0500
@@ -205,7 +205,7 @@
             for (int i = 0; i < children.length; i++) {
                 String aliases[] = ((Host) children[i]).findAliases();
                 for (int j = 0; j < aliases.length; j++) {
-                    if (server.equals(aliases[j])) {
+                    if (server.matches(aliases[j])) {
                         host = (Host) children[i];
                         break;
                     }
Comment 1 Mark Thomas 2010-02-04 12:26:39 UTC
*** Bug 7676 has been marked as a duplicate of this bug. ***
Comment 2 Mark Thomas 2011-03-06 17:35:47 UTC
Whilst this would have been trivial in 4.1.x, the Mapper was re-written and adding regular expression support for aliases is non-trivial in all supported Tomcat versions.

If this is implemented, it will be in 7.0.x or later so updating version.
Comment 3 Mark Thomas 2011-05-26 13:03:57 UTC
*** Bug 51269 has been marked as a duplicate of this bug. ***
Comment 4 Mark Thomas 2011-08-10 18:20:55 UTC
With the re-factoring of the mapper since this was requested it is no longer trivial so edit the title
Comment 5 Christopher Schultz 2011-09-20 21:33:24 UTC
Unfortunately for this enhancement, the Mapper is written to use a relatively high-performance binary search algorithm to find a matching <Host> given all the <Host> and <Alias> definitions (where <Alias>es are basically treated as if they were additional <Host> definitions with shared configuration information).

Such an algorithm can't work with regular expressions because they can't be sorted in any sane way. Thus, a linear search would have to be performed which will degrade performance (along with the execution of all those regular expression matches).

If we're going to do this, I would recommend that we add another layer of abstraction near the insertMap/find/findIgnoreCase mechanism. Something along these lines:

1. The Mapper works exactly as it currently does under "normal" operations.
2. When a Host (or Alias) is added with special markings (say, leading and trailing "/" marks which are easily recognizable and otherwise illegal in a hostname), we switch into "regex mode".
3. Entering into regex mode creates another List/Map/whatever of metadata that will need to be maintained: existing <Hosts> are processed-into this data structure.
4. Hosts (and Aliases) added after going into regex mode will of course have this metadata updated as appropriate
5. In internalMap, modify the call to findIgnoreCase depending upon the current mode: if we're in "normal" mode, just call findIgnoreCase and move-on; if we're in "regex" mode, make a different call that uses the aforementioned metadata to find the proper index into the <Host> array

This could be done by re-factoring the Mapper to use a pair of new private inner classes: one that behaves like the current implementation, and one that works with regular expressions. Switching from one mode to the other could be done by replacing the currently-effective Mapper with one of the other type. You could even go back if you really wanted to.

We will need a policy of conflict resolution: more than one <Host> could match. We can't use a policy like "longest match wins" because it's tough to tell which regex is "longer" due to their complexities. I would favor a linear search where the first-defined <Host> wins. The Digester should give us all <Host> elements in XML-order, so the user can specify the order of preference just by ordering server.xml to their liking.

Thoughts?
Comment 6 Konstantin Kolinko 2011-09-21 00:04:47 UTC
Since last time when I participated in discussion of such a feature [1] my thought is

- start processing in the Mapper with the currently implemented binary search
- iff the binary search has not found a match, proceed with alternative match algorithms (regexps, wildcards, etc.)
- fallback to using the default host

[1] "Several hosts within one tomcat / catch-all problem"
http://marc.info/?t=129018176200002&r=1&w=2
http://markmail.org/thread/si3msz4fstgbfenq
Comment 7 Christopher Schultz 2011-09-22 16:01:00 UTC
Konstantin, in that case we would just maintain a separate list of regex-based <Host> matchers and they would all be consulted if no exact match was found, right? Fall-back is the same.

Your suggestion is certainly simpler than mine, and will perform better when there are mixed regex/non-regex <Host> declarations.
Comment 8 Konstantin Kolinko 2011-09-25 00:13:23 UTC
(In reply to comment #7)
> Konstantin, in that case we would just maintain a separate list of regex-based
> <Host> matchers and they would all be consulted if no exact match was found,
> right? Fall-back is the same.
> 
> Your suggestion is certainly simpler than mine, and will perform better when
> there are mixed regex/non-regex <Host> declarations.

Yes. That is the idea.

The thread I linked in comment 6 proposes yet another matching:
*.domainname

That is suffix matching,  like the one you can use when configuring your DNS, or when purchasing an SSL certificate for your site.
Comment 9 Christopher Schultz 2011-09-26 19:10:58 UTC
Suffix-matching would certainly be faster than regular expression matching, but it has the same problems the regex's have with not being sortable, etc.

Suffix-matching seems (to me) to be much more practically useful than complete regular-expression support, but I always just use a single default host, so what do I know?

I just wouldn't watch to mix-and-match too many name-matching styles or things are going to get seriously ridiculous (exact-match, then suffix-match, then regex-match, then what next?).

I do agree that keeping the existing code in there is a good thing: most folks don't need regex-matching and so the binary search is a good implementation. The question is how many other strategies are worth implementing /after/ an exact-match is not found?
Comment 10 Konstantin Kolinko 2011-09-27 01:12:47 UTC
(In reply to comment #9)
> I just wouldn't watch to mix-and-match too many name-matching styles or things
> are going to get seriously ridiculous (exact-match, then suffix-match, then
> regex-match, then what next?).
> 
> The question is how many other strategies are worth implementing /after/ an
> exact-match is not found?

I think it is just a single strategy: iterate linearly over list, trying each candidate whether it matches or not.

The candidate implements its matching algorithm by itself. For <Alias> value written as *.foo it can be suffix matching, and for /foo/ it can be Regexp matching. Or implement separate <AliasRegexp> element.

I do not see any alternative to linear search here.

The search order probably depends on the order in web.xml (and will change if some hosts are stopped and started again through JMX or host-manager).
Comment 11 Christopher Schultz 2011-09-27 18:06:26 UTC
(In reply to comment #10)
> I think it is just a single strategy: iterate linearly over list, trying each
> candidate whether it matches or not.

Oh, I meant "do we support suffix matches only, or do we also support regular expression matches", not "how do you iterate over those items"? :)

> The search order probably depends on the order in web.xml (and will change if
> some hosts are stopped and started again through JMX or host-manager).

Agreed. I suppose if there's a base class or interface to represents a matchable item (and probably a factory that knows how to convert, say "/host/" or "*.host" into the appropriate subtype), then there's no limit to the complexity that can be introduced.

I'll take a stab at this.
Comment 12 Alessandro Polverini 2013-08-09 08:50:32 UTC
This feature would be very, very appreciated since apache supports wildcard aliases and it is usually used as a frontend for tomcat.

I think that suffix-match would be more than enough for a first implementation, since it has most real-world use-cases.
Comment 13 Christopher Schultz 2013-08-09 13:41:16 UTC
My stab has obviously languished, here. I'll take another look at this because I do in fact think it's quite useful.

Alessandro, I'm interested in what your use-case is where you have a fronting httpd but still need to multiplex between different <Hosts> in Tomcat. Do you have a complex configuration where different webapps are (exclusively) available to different hosts?
Comment 14 Alessandro Polverini 2013-08-09 14:00:02 UTC
Hello Christopher,
I've some setups where multiple (and different) web apps use domains of the form locale.mydomain.com to automatically select the language of the site.

Of course in the conffig I can list all the supported language but they may be many tens and a wildcard would be much simpler.

On another use case there are some webapps that uses subdomain like x.mydomain.com to generate some kind of (dynmical) custom blog or report, and in that situation the wildcard is essential because the name of the subdomain can't be known in advance.

I usually use multiple tomcat instances, each with a default webapp for this situation.
Comment 15 Christopher Schultz 2013-08-09 15:42:27 UTC
Which of those cases can't be solved trivially with a catch-all default host? You are not required to explicitly specify every hostname that a client might use.
Comment 16 Alessandro Polverini 2013-08-09 16:17:52 UTC
As I explained, I'm actually using a default host but if you have more that one host with wildcards you end up using multiple tomcat instances, and that's far from ideal.
Comment 17 Remy Maucherat 2013-08-12 15:03:10 UTC
The rewrite valve that is now in Tomcat 8 supports vhost rewriting.
Comment 18 Alessandro Polverini 2013-08-12 15:38:45 UTC
How the rewrite valve could solve this problem?
Comment 19 Mark Thomas 2016-07-03 20:26:11 UTC
I've added support for wildcard hosts in 9.0.x which will be in 9.0.0.M9. Providing feedback is positive, I'll back-port it to at least 8.5.x.
Comment 20 Mark Thomas 2016-08-01 09:54:55 UTC
It has been back-ported to 8.5.x for 8.5.5 onwards.

I don't propose back-porting it further.