Bug 53483 - dangerous PCRE patterns in *Match directives
Summary: dangerous PCRE patterns in *Match directives
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 2.2.22
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: HTTP Server Documentation List
URL:
Keywords: MassUpdate
Depends on:
Blocks:
 
Reported: 2012-06-29 01:31 UTC by Christoph Anton Mitterer
Modified: 2018-11-07 21:09 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Anton Mitterer 2012-06-29 01:31:52 UTC
Hi.

This is:

a) something that should definitely go to the release notes and announcement mailing list


b) complicated to the complex handling and differences of whether request URIs and especially directories have trailing / or not

c) I tried this so far only with Apache 2.2 and don't have an 2.4 right now available.
I'm well aware of the avialability of $ in DirectoryMatch now, and the documentation on it.

d) The documentation of the respective directives should be changed.

e) marked as major, as it easily grants access to things, that are not expected to be accessible by the average users.




Now to the issue:


For all places where PCRE patterns are used, people will most usually want them:

a) to start with "^"
=> otherwise the patterns match at any place

b) and end them with depending on the case (typically whether regular files or directories should be matched) with "$" or "(?:$|/)"
=> other wise the pattern dont't end
=> my proposal, "(?:$|/)", works only (probably) well with Apache 2.4 ... more on that later.


Before I go on all the directives:
The "?:" in "(?:$|/)" above means, that the subpattern (the "(...)" thingy) is non capturing, i.e. you cannot use the match via e.g. \1 . This makes things faster and one usually doesn't want to use $|/ later on.


The directives I cover below are likely not all affected; basically everything which may have PCRE regexps as parameter might be.
Further I do not list the e.g. <Directory ~ "regexp"> forms, which are of course affected, too.
I focus mainly on the directives which may have some file/dir access controlling effect.


I) With respect to the "to start with '^'" thingy
AliasMatch:
Here you already clearly show the difference between patterns starting with ^ and those not.

<DirectoryMatch>:
The "^/www" example is safe, but please add another warning box, that notifies about the differences between patterns starting with ^ and those not.

<FilesMatch>:
The "\.(gif|jpe?g|png)$" example is safe (or better said: what one expects), though I'd suggest to change it to ".+\.(gif|jpe?g|png)$" as one usually don't want to match the files ".gif", ".png", ".jpeg" or ".jpg" themselves.
But anyway please add another warning box, that notifies about the differences between patterns starting with ^ and those not.

<LocationMatch>:
The example is "wrong" in a sense that it's not what people usually want (matching the absolute server path). The text below says that already, but perhaps mark the "contains" bold.
But anyway please add another warning box, that notifies about the differences between patterns starting with ^ and those not.

RedirectMatch:
The "(.*)\.gif$" example is safe (or better said: what one expects), though I'd suggest to change it to "/(.+)\.gif$" as one usually don't want to match the file ".gif" themselves (the leading "/" is required, otherwise it would still match ".gif" itself.
But anyway please add another warning box, that notifies about the differences between patterns starting with ^ and those not.

ScriptAliasMatch:
The examples are safe (or better said: what one expects).
But anyway please add another warning box, that notifies about the differences between patterns starting with ^ and those not.



Now so far things are not too bad, but now it gets problematic:
I) With respect to the "to end with '$' or '(?:$|/)'" thingy
AliasMatch:
The text says: "For example, to activate the /icons directory".
But actually the patterns also match any directories like /iconsFOOBAR.
So the first example is dangerous and tricky, it should be changed to:
"^/icons(/.*)(?:$|/)"
which matches:
/icons
/icons/
/icons/foobar
/icons/foobar/
but not:
/iconsFOOBAR
Analogously the case-insensitive example below.
The other examples are safe, as far as I can say:
But anyway please add another warning box, that notifies about the differences between patterns ending with (?:$|/) and those not.

<DirectoryMatch>:
Here things are really dangerous:
The example "^/www/(.+/)?[0-9]{3}" is explained as "would match directories in /www/ that consisted of three numbers".
Due to the (.+/)? this os wrong anyway as it must read "would match directories in __OR SOMEWHERE BELOW__ /www/ that consisted of three numbers".
But it's even worse: as there is no ending of the pattern, it also matches directories like:
"^/www/foobar/123BAZ"
Or try even a more simple example, e.g.
<DirectoryMatch "^/s">
Order Allow,Deny
Allow From All
</DirectoryMatch>
The directory "/secret" is matched too, and not just the directory "/s/.
The example must be changed to: "^/www/(.+/)?[0-9]{3}(?:$|/)"

Why is my "safe" end pattern (?:$|/) and not just /. Because we also want to match directories that have no trailing / set.
As mentioned above, this should AFAIU work with apache 2.4, and needs the "$" part.

Please add a red box, where you note people that this is different from how <Directory> worked,... when the want to match full directory names (e.g. /123 and not also /123*) they need to close the pattern somehow, and take examples like
/foo(?:$|/) means:
/foo
/foo/
/foo/bar
but not:
/fooBAZ
and
/foo(?:$|/$) or /foo/?$ means:
/foo
/foo/
but not:
/foo/bar
/fooBAZ

<FilesMatch>:
The example are safe, as far as I can say:
But anyway please add another warning box, that notifies about the differences between patterns ending with $ and those not.

<LocationMatch>:
Analogous to <DirectoryMatch>...the example is dangerous, as not only substrings /special/data but also /special/dataFOO would be matched,.. which is correct from the wording, but people probably think that the later wouldn't be matched.
So please add analogous info as above with <DirectoryMatch>.

RedirectMatch:
The exampl is safe, but anyway please add analogous info as above with <DirectoryMatch>.

ScriptAliasMatch:
Analogous to <DirectoryMatch>...the example is dangerous, as not only substrings /cgi-bin but also /cgi-binFOO would be matched,.. which is correct from the wording, but people probably think that the later wouldn't be matched.
So please add analogous info as above with <DirectoryMatch>.



You see there is a lot of trickery in it. I don't doubt that you developers know all these, but I guess for the average user, it's difficult to see from the current documentation.


Actually there is even more trickery:
My "solutions" above, did not recongise that many slashes, are typically collapsed to one, i.e. /foo////bar// is the same as /foo/bar/.
I'm not sure whether we should cover this or not, the above patterns would need to be adapted with some quantifiers.


If you make changes, please post the commit IDs, so I can follow and verify :-)


Hope that helps and loads of thanks in advance,
Chris.
Comment 1 Christoph Anton Mitterer 2012-06-29 01:54:50 UTC
Just noted, for DirectoryMatch (but not for the others, e.g. LocationMatch, or AliasMatch).... it makes obviously no sense to handle the case of multiple trailing "/" e.g. /var////www/public///

These are likely anyway collapsed by either Apache internally or the OS.

Or no?
Comment 2 Christoph Anton Mitterer 2012-06-29 02:13:56 UTC
I just noted that for the same reasons as in comment #1, and again also just for Directories, but not for locations/aliases etc.

"(?:$|/)"
should be replaceable by:
"/"
(also matching subdirectories

and
"(?:$|/$)"
by:
"(/$)"
(not matching subdirs).



However, even the first one doesn't work with apache2.2
When I give e.g. the pattern
"^/foo/"

i can open the URI
/foo/
/foo/x
but not:
/foo

why??
Comment 3 Eric Covener 2012-06-29 02:32:26 UTC
Can't follow.  How about providing an update in patch form?
Comment 4 Eric Covener 2012-06-29 02:33:09 UTC
Reclassifying as 2.2 based on comments.
Comment 5 Christoph Anton Mitterer 2012-06-29 10:23:24 UTC
Hi Eric.

Where exactly do you get stuck?

Generally, I see two "problems":
a) Documentation should be improved, to better educate users what they're doing there. I could write patches for this. 
b) I don't understand why e.g.
<DirectoryMatch ^/some/path/a/>  #note the trailing slash
Order Allow,Deny
Allow From All
Options +indexes
</DirectoryMatch>

let's me access via the URL:
http://somehost.org/a/
but NOT
http://somehost.org/a

http://somehost.org/a/foo => works, as expected
http://somehost.org/aFOO => doesn't work, as expected



Regarding the rebasing:
(a) applies to 2.4, too (I was working on that's documentation)
(b) I'll set up some 2.4 installation to check whether that happens there, too
Comment 6 Aurelio Jargas 2012-08-24 17:33:36 UTC
I agree with Christoph about the documentation problems for these directives. They're not wrong, but some additional warning must be made to avoid "over-matching".

The problem is that regex is *always* a partial match, but the non-regex counterparts do a full match. This makes a huge difference.

    <Files "image.png">
    Matches image.png, but not myimage.png nor image.png.zip.

    <FilesMatch "image\.png">
    Matches image.png, myimage.png, image.png.zip, image.png/foo, ...

This partial match is not expected by the user, since the non-regex directive does not work that way. It's important to make this distinction very clear in the docs, in all the path-related directives.

And also encourage the use of anchors and slashes to avoid the undesired partial matches.

    <FilesMatch "/image\.png$">
    Matches image.png only, in any folder.

    <FilesMatch "^/image\.png$">
    Matches image.png only, in root folder.

Slashes and $ are tricky in folder-related directives, such as in <DirectoryMatch>.

    <Directory "foo">
    Matches folder foo, in any folder

    <DirectoryMatch "foo">
    Matches folders foo, foobar, myfoo, … in any folder

    <DirectoryMatch "/foo">
    Matches folders foo, foobar, … in any folder

    <DirectoryMatch "/foo/">
    Matches folder foo, in any folder

    <DirectoryMatch "^/foo/">
    Matches root folder foo, *and all its subfolders*, because of the partial match.

    <DirectoryMatch "^/foo/$">
    Matches root folder foo
    (only works in v2.4, see Bug 49809)

    <DirectoryMatch "/foo/$">
    Matches folder foo, as the last path component, in any folder
    (only works in v2.4, see Bug 49809)

For the user, it's difficult to understand all these subtle differences without examples and proper explanation.

MY SUGGESTION

Since the partial match is the great culprit for the confusion, my suggestion for the docs is to update all the examples to use full anchored regexes, with ^ and $, and encourage the user to *always* do it this way, to avoid unexpected results. Even if all she wants is a partial match:

    <FilesMatch "^.+\.(gif|png|jpg)$">
    <DirectoryMatch "^.*/secret/.*$">

Then all the mentioned problems are reduced to only one problem: make your full regex right. No Apache inner workings knowledge necessary. And if you don't know regex, don't mess with it :)
Comment 7 Christoph Anton Mitterer 2012-08-24 17:54:24 UTC
Aurelio, that's basically what I mean :)

It's correct that the current examples are not strictly wrong, they are just not that what the end-user probably wants.


And we should really teach them how to do it proper and safe.


My original contained already many suggestions and places on what and where I'd change.
It's just that right now, my time is highly limited.


Cheers,
Chris.
Comment 8 Aurelio Jargas 2012-08-24 18:29:31 UTC
Hi Christoph, my intention was to explain the problem in a different way, hopefully easier to understand by those who are not fully aware of it. And maybe my simpler suggestion is easier to implement. I hope that it helps the doc maintainers to get this issue fixed.
Comment 9 Rich Bowen 2015-04-14 18:21:25 UTC
r1673487 clarifies a DirectoryMatch example.
Comment 10 Rich Bowen 2015-04-14 18:30:59 UTC
FWIW, I disagree with using the ?: syntax in the regular expressions. This isn't a regex tutorial, and using advanced regex techniques clouds the issue for people that are trying to learn httpd syntax. I certainly wouldn't -1 a patch, but I think that it makes the document more about regex syntax than about *Match syntax, which feels out of scope.
Comment 11 Christoph Anton Mitterer 2015-04-14 18:41:28 UTC
Well I think one could add a one liner, describing what this is about, but apart from that I'd use the better version.

The reason is simply, that people will take these example patterns from the official documentation and extend them to their own needs.

So why teaching millions of users something worse, when they could easily use something better?
Comment 12 Rich Bowen 2015-04-14 18:57:42 UTC
Yeah, it's a good point.

I'm torn between "simplest possible example that works" and "best practice", when best practice requires significant additional explanation of regular expressions.

I'll see if I can find a way to phrase it so that it doesn't simply confuse. Thanks for your remarks.
Comment 13 Rich Bowen 2015-04-14 19:11:08 UTC
I've patched a bunch of different places, but I haven't yet done the (?:$|/) stuff.

So, I'm going to leave this ticket open for now, because I lack time to complete it thoroughly.

To summarize:

There are some regexes in *Match directive examples where the trailing slash is left off, so that it could match unintended extra characters. In these cases, we (might) want to put either (?:$|/) - a non-capturing "/ or end of string" pattern - or simply a [/$] on the end of it, to ensure that it matches *only* what we intend.
Comment 14 Christoph Anton Mitterer 2015-04-14 21:00:54 UTC
(In reply to Rich Bowen from comment #12)
> I'll see if I can find a way to phrase it so that it doesn't simply confuse.
> Thanks for your remarks.

Maybe one should provide at each directive only the most minimal version of a regexp example and then link to a more detailed part of the documentation.

This documentation could perhaps still work with the "simple version", but have a section called "In production use" or so,.. where it's explained that every usage of (...) is typically better with the complex form.
Comment 15 Rich Bowen 2015-04-15 13:15:56 UTC
Yes, I think I'd like to do that.
Comment 16 William A. Rowe Jr. 2018-11-07 21:09:17 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.