Bug 64996 - clarifications about vhost matching
Summary: clarifications about vhost matching
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: HTTP Server Documentation List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-17 05:14 UTC by Christoph Anton Mitterer
Modified: 2020-12-18 14:27 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Anton Mitterer 2020-12-17 05:14:34 UTC
Hey.

I think there are several ambiguities about vhost matching in the documentation:


1) In vhosts/name-based.html
From the overall text there, it doesn't really become clear that _no_ <VirtualHost> where either IP or port is the wildcard ("*") will be used (as fallback) if there is an explicit *any* IP:port match, even if the ServerName/ServerAlias doesn't match at all.

<VirtualHost *:*>
   #vhostA
</VirtualHost>
<VirtualHost 10.10.10.10:80>
   #vhostC
   ServerName example.org
</VirtualHost>
<VirtualHost 10.10.10.10:80>
   #vhostD
   ServerName example.com
</VirtualHost>

A request to 10.10.10.10:80 with a "Host: foobar.invalid" would still go to vhostC, even though neither ServerName/ServerAlias matches.
I think this is kinda more generally important, as people will often want to prevent "wrong" Host-Header-Requests to end up on a "normal" vhost, but rather on some catch-all vhost, which they might think the *:* could be.
Instead they'd need, for each and every IP:port combination another catch-all vhost  like:
<VirtualHost 10.10.10.10:80 20.20.20.20:8080 ...[every other used IP:port combination] >
   #vhostB
   ServerName example.org
   DocumentRoot /some/invalid/host
</VirtualHost>
coming in between vhostA and vhostC (or better said, before C and D).


2) in vhosts/details.html
it's a bit better explained, because it says

> If there are no exact matches for the address and port, then wildcard (*) matches are considered.

but it's still not 100% clear because it says "then wildcard..." and not "then AND ONLY THEN wildcard..."

Perhaps it should be even more clear than that and saying: as soon as explicit/literal IP:port match is found,... those vhosts will be used EVEN if they don't contain any matching ServerName/ServerAlias (in which case the first matching vhost will be used)... and EVEN if another vhost with wildcards would contain a matching ServerName/ServerAlias


Cheers,
Chris.
Comment 1 Eric Covener 2020-12-17 13:49:49 UTC
name-based.html already has these three paragraphs, unless there are specific changes to suggest I'm not sure more text or repetition will help:

Name-based virtual hosting builds off of the IP-based virtual host selection algorithm, meaning that searches for the proper server name occur only between virtual hosts that have the best IP-based address.

When a request arrives, the server will find the best (most specific) matching <VirtualHost> argument based on the IP address and port used by the request. If there is more than one virtual host containing this best-match address and port combination, Apache will further compare the ServerName and ServerAlias directives to the server name present in the request.

The default name-based vhost for an IP and port combination
If no matching ServerName or ServerAlias is found in the set of virtual hosts containing the most specific matching IP address and port combination, then the first listed virtual host that matches that will be used.
Comment 2 Christoph Anton Mitterer 2020-12-18 04:19:18 UTC
<TL;DR>

As indicated in the other ticket, my ultimate suggestion would probably be to rather re-write the documentation and merge it all in one simpler document, with about the following abstract:

- tell the concepts of a vhost (i.e. one httpd serving (typically different) content, reachable from different domainnames/addresses/ports.

- tell what IP based hosting is/was, namely that the vhost is selected purely based on the addr/port (at which the connection arrives)... perhaps adding that this is typically not necessary anymore (especially in times of tight v4 space) and that even past reasons (non-SNI TLS) are gone - instead of just vaguely indicating this, which just confuses users)

- tell what name based hosting is, namely the clients send a HTTP Host header giving the domainname of the host they desire (and that this is usually identical to the domainname in the client's URL
and the crucial point: selection is done based on that Header, rather then the IP/port
perhaps also, that with TLS, the SNI name is take for the Host

- tell how it actually works in apache, which does basically both at the same time, i.e. how the vhost is selected:
1) first take any vhosts whose addr:port match literally (i.e. no wildcard)
2) if none, take also those with wildcards)
3) if there is a HTTP Host Header, match that in the remaining vhosts and fall back to the first one if no-one matches or there is none.
4) if there was matching no vhost at all, give it to the main server

- remainders are special thing, like where  ServerName get's its default from, or who the implicitly added ServerAlias (form <VirtualHost> name works... or that a vhost literal hostname (instead of addr) will be replaced by ***all*** addresses that name resolves to at server startup.... and other special stuff like: what if more than one <VirtualHost> have a literal addr:port match.
</TL;DR>


---------------------


But given the above would be a huge effort, a solution for this ticket would perhaps be to really clarify what's meant with phrases like (from your quotations):
- "the proper server", "the best IP-based address"
- "the best (most specific) matching", "best-match address and port"
- "most specific matching IP address and port combination"

I think these phrases are all pretty vague... I mean they're clear for us, since we know what's meant,.. but for a beginner: What's a "best address match"? Is IPv4 better than IPv6? Or is a match 10.10.10.10:8080 better than 10.10.10.10:*

One should simply use a clear description like:
It first considers only vhosts with literally matching address:port pair (i.e. where neither of both is a wildcard *), and only then it also considers those with wildcards.
Comment 3 Christoph Anton Mitterer 2020-12-18 05:45:49 UTC
After playing around a bit more I'd even say two more points are missing from vhosts/name-based.html AND in vhosts/details.html


I) Maybe I'm blind but after skimming through both several times I couldn't find what happens if there are multiple <VirtualHost> blocks which "match" for a request as good as each other (i.e. no ServerName/Alias match at all, or all the names same).

It actually *is* indicated in mod/core.html#virtualhost, where it says:
"If multiple virtual hosts contain the best matching IP address and port, the server selects from these virtual hosts the best match based on the requested hostname. If no matching name-based virtual host is found, then the first listed virtual host that matched the IP address will be used."

but it says this only for name based matching,... while it seems to be generally the case, i.e. if I have:
<VirtualHost *:*>
  #vhost A
</VirtualHost>
<VirtualHost *:*>
  #vhost B
</VirtualHost>

then B, and only B, will be used, right?!



II) I thought a bit more about the "simple" selection algorithm I wrote down above and the term "best match" and noticed that mine is also kinda flawed, but the docs also don't seem to mentioned anywhere the real truth:

I wondered, what happens if I have:
<VirtualHost 10.10.10.10:*>
  #vhost A
</VirtualHost>
<VirtualHost *:8080>
  #vhost B
</VirtualHost>

and a request goes on to 10.10.10.10:8080 (regardless of whether name based or not).

And in fact instead of:
- first, look at vhosts with a literal addr:port match (regardless whether name-based or not)
- second, if none were found, look at those with wildcards

it seems the following is the case:

- first, look at vhosts with a literal addr:port match (regardless whether name-based or not)
- second, if none were found, look at those with addr:* matches (where addr is again literal)
- third, if none were found, look at those with *:port matches (where port is again literal)
- fourth, if none were found, look at those with *:* matches.
- last but not least, use the main server

And in each[0] group:
- if Host-Header is given and matches, use that vhost
- if there were multiple vhosts with matching Host-Header, use the first one (and only the first one) in the config file
- if there was only one vhost and/or no matches of the Host-Header or no Host-header give, use the first vhost from the group (which explains both, why the name based hosts-default to the first vhost,... and why e.g. two vhosts with both *:*, or both *:80, or both 10.10.10.10 use the first)


Cheers,
Chris.
Comment 4 Christoph Anton Mitterer 2020-12-18 05:50:29 UTC
[0] Well, obviously there wouldn't be any further matching for Host-Header in the main server... but what if one had:

<VirtualHost *:*>
  #vhost A
  ServerName example.org
</VirtualHost>
<VirtualHost *:*>
  #vhost B
  ServerName example.com
</VirtualHost>

would that still select example.com if the request hat that as Host Header?
Comment 5 Eric Covener 2020-12-18 14:27:10 UTC
made some improvements around "best" used unqualified in http://svn.apache.org/viewvc?rev=1884608&view=rev

I will leave the bug open as an enhancement if anyone wants to contribute more concise detail about further explaining how IP-based matches or how the narrowing of this affects how name-based selection can proceed.