Bug 57584

Summary: AddHandler etc. documentation "extension" unintuitive
Product: Apache httpd-2 Reporter: Chris Pennello <apache>
Component: DocumentationAssignee: HTTP Server Documentation List <docs>
Status: RESOLVED FIXED    
Severity: normal CC: apache
Priority: P2    
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Chris Pennello 2015-02-13 17:52:14 UTC
The following issue requested that the extension functionality of AddHandler be changed.
https://issues.apache.org/bugzilla/show_bug.cgi?id=10768
It was closed as invalid due to the definition of "extension" in httpd.  A filename is interpreted as being able to have _multiple_ extensions.
See
http://httpd.apache.org/docs/2.2/mod/mod_mime.html#multipleext
and
http://httpd.apache.org/docs/current/mod/directive-dict.html
under "extension".

This is a frequent cause of problems when the AddHandler etc. directives are used.  I would like to suggest a couple possible remedies to assist users in more quickly understanding httpd's behavior.

1. Don't use the word "extension" in the AddHandler etc. documentation.  Instead, use a word like "substring", which is more accurate, and also more intuitively in line with readers' expectations.
2. Alternatively, add a brief note in the directive documentation mentioning the unusual definition of extension used in httpd, and link to further details.

I would also like to suggest that the section describing files with multiple extensions, at
http://httpd.apache.org/docs/2.2/mod/mod_mime.html#multipleext
include a note about the suggested solution with FilesMatch.  It could be helpful to note that that example will match (potentially unexpectedly) the hidden file ".cgi".
Comment 1 Rich Bowen 2015-04-15 20:57:34 UTC
Fixed the FilesMatch so that it no longer matches the file '.cgi' in r1673954
Comment 2 Rich Bowen 2015-04-15 21:02:41 UTC
I am tempted to close this "wontfix", but I'll seek input from other folks before doing so.

The use of 'extension' is consistent with how it is used in the computing world at large, and, in particular, in the *nix world. What we're talking about is called a file extension, and files can have multiple file extensions.

whatever.tar.gz files, for example, have been around forever, and tar and gz are talked about as the two file extensions.

Using 'substring' would be using a new term for an old concept.

This is not an "unusual definition of extension". It's standard.
Comment 3 Tim B 2015-04-15 21:10:46 UTC
http://httpd.apache.org/docs/2.4/mod/mod_mime.html#multipleext seems clear to me.

The httpd documentation is aimed at technically minded people rather than end users of websites.

In >6 years of commercial technical support I helped some webmasters who didn't realise that files could have more than one extension. When discussing it, that was the phrasing: “more than one extension”.

Filenames certainly can have more than one *substring*. I feel that “string”, and hence “substring”, is actually less clear than the term “extension” – the latter at least suggest that it's something that goes onto the end.
Comment 4 Rodent of Unusual Size 2015-04-15 21:12:29 UTC
An alternative also widely in use is "suffix".  Does Guillaume d'Occam like that?
Comment 5 sebastian 2015-04-15 21:28:00 UTC
(In reply to Rich Bowen from comment #2)
> whatever.tar.gz files, for example, have been around forever, and tar and gz
> are talked about as the two file extensions.

One could also say that "(.)tar.gz" or "(.)gz" is the extension of that file.  it's a matter of view and a matter of past exposure.
Most software cares about a single file extension only (file browsers, image viewers, ..) and unlike users of Linux who are familiar with .tar.gz, Windows users may not have seen any files with multiple extensions before.  Other than .en.html and .tar.gz, is there anything common with multiple extension even that users could have been run into before meeting Apache?
Comment 6 sebastian 2015-04-15 21:37:05 UTC
(In reply to Rich Bowen from comment #1)
> Fixed the FilesMatch so that it no longer matches the file '.cgi' in r1673954

Is that change meant to be about hidden file in general?  I would expect something like "^[^.].*\.cgi$" (or "^[^/.].*\.cgi$"?) rather than ".+\.cgi$" in that case.  Right now, would '...cgi' still be hidden and executed?
Comment 7 Rich Bowen 2015-04-16 15:38:11 UTC
Well, yes, that's true, if ... improbable.

I'm always reluctant to make simple docs into regex soup for the sake of a bizarre edge case like "...cgi"  Do you feel that's really a common enough scenario that we should care? Or is this perceived as a security consideration?
Comment 8 Rich Bowen 2015-04-16 15:47:18 UTC
I've extended this to [^.]+\.cgi in r1674099

Does that address the concern?
Comment 9 sebastian 2015-04-19 20:22:01 UTC
(In reply to Rich Bowen from comment #8)
> I've extended this to [^.]+\.cgi in r1674099
> 
> Does that address the concern?

It should protect against execution of 
* hidden files and
* files with other extensions before the final ".cgi" extension.

Doesn't sound too bad.
Comment 10 Rich Bowen 2015-04-20 12:51:52 UTC
Thanks. Closing.
Comment 11 sebastian 2015-04-20 12:59:34 UTC
(In reply to Rich Bowen from comment #10)
> Thanks. Closing.

I'm actually not sure if the core of this bug is fixed yet.
The core as shown by the title is that the documentation on "extensions" could be considered unintuitive to larger groups of the user base.
Looking at <https://httpd.apache.org/docs/current/mod/directive-dict.html> for instance, it still reads

  "extension: In general, this is the part of the filename which follows
  the last dot. [..]"

before it goes on to (re)define what an extension is to Apache.
With that first sentence and previous background, it would make perfect sense to stop reading after that first sentence.  My vote for more changes in that area.

Chris, where are you? :)
Comment 12 Chris Pennello 2015-04-20 17:03:13 UTC
(In reply to sebastian from comment #11)
> Chris, where are you? :)
Hello.  :]

> (In reply to Rich Bowen from comment #10)
> > Thanks. Closing.
> 
> I'm actually not sure if the core of this bug is fixed yet.
> The core as shown by the title is that the documentation on "extensions"
> could be considered unintuitive to larger groups of the user base.
> Looking at <https://httpd.apache.org/docs/current/mod/directive-dict.html>
> for instance, it still reads
> 
>   "extension: In general, this is the part of the filename which follows
>   the last dot. [..]"
> 
> before it goes on to (re)define what an extension is to Apache.
> With that first sentence and previous background, it would make perfect
> sense to stop reading after that first sentence.  My vote for more changes
> in that area.
I'm generally agreed.  I tried to provide two actionable recommendations given that it doesn't sound like the behavior itself will change any time soon.  (See the first link I mentioned in the OP.)

If we're intent on continuing to use the word "extension" in the AddHandler, etc. documentation, then I suggest adding a note in the AddHandler, etc. directive documentation regarding the unusual definition used in Apache, and link to further details.  Furthermore, it might be helpful to provide an example that doesn't lead to as many unexpected consequences when naively (as is frequently the case) followed.