Bug 53805 - ?> should only be interpreted as ENDTAG after a command terminator
Summary: ?> should only be interpreted as ENDTAG after a command terminator
Status: NEW
Alias: None
Product: Rivet
Classification: Unclassified
Component: mod_rivet (show other bugs)
Version: unspecified
Hardware: PC FreeBSD
: P2 normal
Target Milestone: mod_rivet
Assignee: Apache Rivet Mailing list account
Depends on:
Reported: 2012-08-31 13:15 UTC by Pietro Cerutti
Modified: 2016-10-04 22:07 UTC (History)
2 users (show)

Only interpret ENDTAGs after a full command, step 1 (1009 bytes, patch)
2012-08-31 13:15 UTC, Pietro Cerutti
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pietro Cerutti 2012-08-31 13:15:44 UTC
Created attachment 29313 [details]
Only interpret ENDTAGs after a full command, step 1

Currently, the rivet parser terminates a block of as soon as ?> is encountered. However, there are places where this should not happen, e.g.,


puts ?>  ; # ?> is a valid string in Tcl

# Inside a comment like this ?> it causes problems, too 

# Need a real world example?

puts {<?xml version="1.0" encoding="utf-8"?>}  ; # This raises an error..


I propose the attached patch, which is a first step in correcting the current handling of ENDTAGs, in that it only interprets the sequence ?> as a ENDTAG after newlines or semicolons (plus optional horizontal spaces). Basically, it permits an ENDTAG where a Tcl command is expected.

I understand that this change causes incompatibility, e.g.,

    puts hello?>

this won't work anymore.

    puts hello;?>

this will.

Perhaps we want to enable this functionality only when some httpd.conf directive is set. I'm open to suggestions.

If the general idea is accepted, there would be a second step where we take quotes into consideration: this won't work (actually, it doesn't work now either):

    puts {
          I'd like to say 
          many things; ?> but
          an error arises...
Comment 1 Massimo Manghi 2012-08-31 21:31:18 UTC
thanks for filing this feature request. It's difficult to deal with all the possible subtletes implied by the mixing up of 2 totally unrelated languages, like in the case of embedding Tcl within HTML. As you have certainly seen, rivet's parser is simple and works in the most straightforward way. There is no Tcl parsing during the template parsing, therefore the parser is not aware of Tcl commands and comments. We probably should make it explicit. The clever yet simple example you made

# Inside a comment like this ?> it causes problems, too 

can be seen in a different way. How do interpret this line?

# Inside a comment like this ?><b>This is real HTML!</b>

is the real HTML element meant to slip into the Tcl code? How can you tell the parser where it should stop parsing Tcl and start an HTML block?

 -- Massimo
Comment 2 Pietro Cerutti 2012-08-31 21:52:25 UTC
Ciao Massimo,

I see your point, yet IMHO between STARTTAG and ENDTAG everything should be threaded as Tcl, and the rules of Tcl should apply, including the "command terminates at \n or ;" rule.

# Inside a comment like this ?><b>This is real HTML!</b>

I personally see this as a Tcl comment line, i.e., nothing should be printed.

But I agree that this is debatable, so if you like the current behavior better I'll stick with it and print my xml header like this

puts {<}
puts {?xml version="1.0" encoding="utf-8"?}
puts {>}

Comment 3 Massimo Manghi 2012-09-22 23:09:11 UTC
changing this bug as not related to a specific version of Rivet
Comment 4 Massimo Manghi 2016-10-04 22:07:33 UTC
Reviewing this bug (after 4 years it had been filed) I realized Pietro had proposed a patch. I usually don't miss to recognize and encourage contributions, so I feel terribly guilty (why didn't you insist Pietro?). I will test this patch soon (I could in case create a branch where the patched parser for anyone to try it)  I promise not to put this patch on hold for ages before considering to apply the patch to the current parser.