Bug 57448

Summary: SSI <!--#set --> cannot capture backreferences from regex match in <!--#if -->
Product: Apache httpd-2 Reporter: Anders Kaseorg <andersk>
Component: mod_includeAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: regression CC: eveslage, hk, richard-apache, sam
Priority: P2    
Version: 2.4-HEAD   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Anders Kaseorg 2015-01-16 00:27:33 UTC
In Apache 2.2, one could set an SSI variable based on backreferences from a regex match in the previous <!--#if -->:

<!--#if expr='$REQUEST_URI = /(.*)/' -->
<!--#set var="foo" value="$1" -->
Found <!--#echo var="foo" -->
<!--#endif -->

However, in Apache 2.4, the equivalent code doesn’t work:

<!--#if expr='v("REQUEST_URI") =~ /(.*)/' -->
<!--#set var="foo" value="$1" -->
Found <!--#echo var="foo" -->
<!--#endif -->

It sets the variable to the empty string and yields this error:

[Thu Jan 15 19:23:20.763133 2015] [include:warn] [pid 6768:tid 140695587436288] [client 127.0.0.1:59575] AH01330: regex capture $1 is out of range (last regex was: '(null)') in /var/www/html/test.shtml

I can still use the Apache 2.2 code if I set SSILegacyExprParser on, but obviously there should be a non-deprecated way to do this.
Comment 1 aa 2015-04-09 14:20:28 UTC
I can reproduce this bug. <!--#set var="foo" value="$0" -->  does work, it contains the whole string, but $1 does fail exactly as described earlier.
Comment 2 Pete 2015-09-29 22:13:46 UTC
This bug is still present in 2.4.16 where I have encountered it today.
Comment 3 Helge 2015-09-30 13:39:18 UTC
Example: Show 198. from 198.19.81.98
<!--#if expr="v('REMOTE_ADDR') =~ /(\d+\.)\d+/ && $1 =~ /(\d\.)/" -->
<!--#set var="foo" value="$0" -->
Found <!--#echo var="foo" -->
<!--#endif -->
You *must* use Backreferences in same Expression!
Comment 4 Ingmar Eveslage 2016-02-26 08:39:54 UTC
(In reply to Helge from comment #3)
> Example: Show 198. from 198.19.81.98
> <!--#if expr="v('REMOTE_ADDR') =~ /(\d+\.)\d+/ && $1 =~ /(\d\.)/" -->
> <!--#set var="foo" value="$0" -->
> Found <!--#echo var="foo" -->
> <!--#endif -->
> You *must* use Backreferences in same Expression!

That's correct, in the same expression you can use $1 and backreference matches. BUT $0 references always the whole string and not the last matched string. in Helges example 
foo == "198.19.81.98"
and not as expected "198."

if you try use a nested if, you can backreference

<!--#if expr="v('REMOTE_ADDR') =~ /(\d+\.)\d+/ && $1 =~ /(\d\.)/" -->
  <!--# if expr='$1 == "198."' --> 
    Found 198.
  <--#endif -->
<!--#endif -->

it matches. this means backreferences are available in nested ap_expr, but not for any "<!--#set" var or "<!--#echo" operation.
Comment 5 Helge 2016-02-26 10:17:15 UTC
(In reply to Ingmar Eveslage from comment #4)
> (In reply to Helge from comment #3)
> That's correct, in the same expression you can use $1 and backreference
> matches. BUT $0 references always the whole string and not the last matched
> string. in Helges example 
> foo == "198.19.81.98"
> and not as expected "198."

I have tested my Example many times on Apache 2.4.12 an it works as expected: foo returns "198.".
$1 from expression /^(\d\.)/ is $0 for 'set var=foo'.
Comment 6 Ingmar Eveslage 2016-02-26 10:27:34 UTC
i boiled it down a litte:

at first: there has to be an indirection with "<!--#set var". echoing directly "$0" doesn't work.

my example shows a work around: 

<!--#set var="test_var" value="1_2_3_4" --> 
<!--#if expr='v("test_var") =~ /(1_)(.*)/ && $1 =~ /(.*)/' --><!--#endif -->
<!--#set var="first" value="$0" -->
<!--#echo encoding='none' var='first' -->

OUTPUT: 1_

changing the second regex in the if statement to $2

<!--#set var="test_var" value="1_2_3_4" --> 
<!--#if expr='v("test_var") =~ /(1_)(.*)/ && $2 =~ /(.*)/' --><!--#endif -->
<!--#set var="first" value="$0" -->
<!--#echo encoding='none' var='first' -->

OUTPUT: 2_3_4

so the simple "$n =~ /(.*)/" acts like an exporter for matched parts.
BUT BE AWARE: don't simplify it "$n =~ /.*/" doesn't work.

i think the bug report stands. somethings doesn't add up.
Comment 7 Helge 2016-02-26 11:04:31 UTC
And why does my RegExample correctly working on my server?
(Last Test on Apache/2.4.12: Fri 2016-02-26 10:50 GMT)

-----------------------
SSILegacyExprParser Off
-----------------------

You CAN use for echoing

#1: #set var="FooBar" value="$0"
    + #echo var="FooBar"

    -OR-

#2: #echo var="0" (var="$0" doesn't work!)


CORRECTED (full) EXAMPLE: Show first ^(\d+\.) from IPv4 Address

<!--#if expr="v('REMOTE_ADDR') =~ /^(\d+\.)/ && $1 =~ /^(\d+\.)/"-->
<!--#set var="FooBar" value="$0" -->
FooBar #1: <!--#echo var="FooBar" --><br>
FooBar #2: <!--#echo var="0" -->
<!--#endif -->

It works!
Comment 8 Ingmar Eveslage 2016-02-26 11:16:32 UTC
Thanks for the explanation of the echo part. <!--#echo var="0" --> works.

your example works for me, too. And it does the same as my example. Match a group and match it again in the same expression. so it gets exported as $0.

i think we can agree on that. but i think its still a workaround. $1...$n should be exported directly as gets exported using the legacy parser. Don't you think.

and if the developer do not agree on that, then at least the fact that: 

<!--#if expr='v("test_var") =~ /(1_)(.*)/ && $1 =~ /(.*)/' --><!--#endif -->
<!--#echo var="0" -->
works and 

<!--#if expr='v("test_var") =~ /(1_)(.*)/ && $1 =~ /.*/' --><!--#endif -->
<!--#echo var="0" -->
doesn't, is still a bug. right?
Comment 9 Helge 2016-02-26 11:34:20 UTC
(In reply to Ingmar Eveslage from comment #8)
> Thanks for the explanation of the echo part. <!--#echo var="0" --> works.

I tell you a secret:
I'm personally using 'SSILegacyExprParser On'.
For me and until now It always works. ;-)

Greetings from Helge
Comment 10 Ingmar Eveslage 2016-02-26 11:36:18 UTC
good to know. but what are the plans for "SSILegacyExprParser". will it be removed in future versions?
Comment 11 Helge 2016-02-26 11:44:03 UTC
(In reply to Ingmar Eveslage from comment #10)
> good to know. but what are the plans for "SSILegacyExprParser". will it be
> removed in future versions?

I think, it could be removed in Apache/2.5 (?)
Comment 12 apache 2016-09-08 02:04:43 UTC
 The code shows that $1 is available in the #if, but not #set, whereas $0 is available in the #set.

 <!--#set var="a" value="abc" --> 
<!--#if expr='v("a") =~ /a(b)c/' -->
 <!--#if expr='$1 == "b"' -->
 Got a match.
 <!--#set var="match" value="a$1" --> 
<!--#echo var="match" -->
 <!--#set var="match" value="a$0" --> 
<!--#echo var="match" -->

 =============== 
Got a match.
 a 
aabc
Comment 13 Richard Birkett 2017-10-31 14:01:18 UTC
I finally got round to migrating my SSI expressions to the "new" ap_expr syntax, and hit this bug.  And it is clearly a bug.

Because $0 is sometimes (though not always) exported from the "if" to a subsequent "set", you can hack it with extra matchers.  Congratulations to the folks who discovered that, as it's a viable workaround!  But it's clearly a hack, and doesn't work if you want to capture more than one substring.

The documentation for ap_expr suggests that modules can, if they want to, allow the  backref variables to survive between expressions.  It's *partially* happening with SSI (but only with $0, and only if there are capturing parentheses, which in themselves shouldn't affect whether $0 is set), so please can it be fixed to work properly?

I've taken a look at util_expr_eval.c and mod_include.c, and my guess is it's somewhere in the code in parse_ap_expr that decides whether to (re)allocate a backref_t struct within the persistent include_ctx_t.  Hopefully somebody more familiar with this area of code will spot it!
Comment 14 Sam Liddicott 2018-02-15 11:54:09 UTC
I'm marking this as regression because as initially reported it breaks sites that were working on prior versions.

With the current state of this bug, this is the rigmarole I have to go through simply to impersonate the current directory index header, before I get to customise it with new content:

<!-- Strip the Query-string -->
<!--#if expr='v("REQUEST_URI") =~ /^([^?]*)/ && $1 =~ /(.*)/' -->
<!--#set var="request" value="$0" -->
<!--#else -->
<!--#set var="request" value="${REQUEST_URI}" -->
<!--#endif -->
<!-- strip the final / unless it is the first / -->
<!--#if expr='v("request") =~ /(\x2F.*)\x2F/ && $1 =~ /(.*)/' -->
<!--#set var="request" value="$0" ->
<!--#endif -->
<h1>Index of <!--#echo encoding="entity" var="request" --></h1>

Thanks to Helge & Igmar for showing this work-around.