Bug 54269

Summary: mod_proxy_html documentation misses several things
Product: Apache httpd-2 Reporter: Christoph Anton Mitterer <calestyo>
Component: DocumentationAssignee: HTTP Server Documentation List <docs>
Status: NEW ---    
Severity: major    
Priority: P2    
Version: 2.4-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Christoph Anton Mitterer 2012-12-09 02:02:38 UTC
Hi.

The documentation of mod_proxy_html misses several things:
1) In most cases, the defaults are missing, e.g. it's not really clear whether ProxyHTMLMeta is now on or off per default


2) ProxyHTMLMeta
I've always thought (from the original documentation) that this would also lead to translate links in meta elements... but this nowhere mentioned... is this the case? It should be documented whether or not.

3) ProxyHTMLExtended
In the original documentation, there was lots of words that this can easily dangerous, as stylsheets/scripts are not really parsed, but simple regexp matching/replacing performed, which can easily lead to corruptions.
Guess this is still the case, so there should some bold warnings be added on what's happening and that this can be dangerous and the admin has to set up his own sane regexps that work for his content.


Cheers,
Chris
Comment 1 Christoph Anton Mitterer 2012-12-09 02:15:42 UTC
(1) applies also at least to ProxyHTMLStripComments where it's IMHO pretty unclear... and perhaps also to ProxyHTMLInterp


And another thing:
4) The original docs listed a ProxyHTMLLogVerbose option, which is not in the Apache docs... was that dropped?
Comment 2 Christoph Anton Mitterer 2012-12-09 02:22:13 UTC
5) The sample config and the original docs imply that one needs to use ProxyHTMLExtended on in order that ProxyHTMLEvents works... this seems to miss in the documentation of the later.
Comment 3 Christoph Anton Mitterer 2012-12-09 03:17:11 UTC
Another thing...
6) ProxyHTMLLinks (at least this, but maybe others like ProxyHTMLEvents, too)

When I define several ProxyHTMLLinks at e.g. server scope, and then define "more" at directory scope, all those from server scope are lost.


Now I think it's quite bad already that you stopped doing the usual style known from other directives like Options where one has + and - ... but at least that behaviour needs to be documented.


Given that the whole proxy_html documentation seems to be in a bad shape, I raised the severity to major... and I've reassigned it to "Documentation".
Comment 4 Christoph Anton Mitterer 2012-12-10 01:51:22 UTC
7) The documentation of ProxyHTMLBufSize refers to (the old upstream documentation's) [nnnn] which is however not used in the Apache docs.
Comment 5 Christoph Anton Mitterer 2012-12-10 02:38:54 UTC
8) This is only a "would be nice"...
AFAIU, proxy_html will really always only process HTML files, even when ProxyHTMLExtended is on.
The later just means that CSS/scripts/etc. embedded inline a HTML will be processed, too.

I think this should go to the Summary section of the module docs and it should be more emphasised in the ProxyHTMLExtended doc.
Comment 6 Nick Kew 2012-12-11 12:15:14 UTC
Just made some updates based on your comments, in r54269 .
Can we close this bug based on those changes, or would you like to offer another patch?
Comment 7 Nick Kew 2012-12-11 12:16:39 UTC
(In reply to comment #6)
> Just made some updates based on your comments, in r54269 .
> Can we close this bug based on those changes, or would you like to offer
> another patch?

Ugh, 54269 was of course this PR.  Should read r1420120 .
Comment 8 Christoph Anton Mitterer 2012-12-15 02:28:51 UTC
Hi.

Let me see...



to (1):
- Are you sure the default is Off? Cause when using the (still external) mod_proxy_html from the EPEL repo, it seems that On is the default.

- There is not yet a default given for ProxyHTMLBufSize, well at least not in the "header"... in the description text it is already named as 8192.
The same applies to ProxyHTMLCharsetOut, with the default being UTF-8.

- There are still no default given for ProxyHTMLFixups, which may be because there is no default for... the closest match would be "reset" but I guess this is not equal to a default.


to (2):
That seems to be not yet answered.

So the question is, when there would be a link in the content attribute of a meta element... is it adapted? Or does this need an additional ProxyHTMLLink or even ProxyHTMLExtended to work?
I think this should somehow be described.

Or whether nothing with respect to this happens at all. And ProxyHTMLMeta really just controls whether meta information is used for charset detection and for conversion to real HTTP headers.


to (3):
Your text is already good, but perhaps one should add somewhere the following, namely that if using extended mode, one should h, e and c flags to ProxyHTMLURLMap... depending onto what class the respective rule belongs to.


to (4):
That seems to be not yet answered.


to (5):
There is now a hint ("Set to On, all scripting events (as determined by ProxyHTMLEvents)") in the documentation of ProxyHTMLExtended... but I think it should also be added to the ProxyHTMLEvents description, that these will only be considered in extended mode.


to (6):
Done.


to (7):
Done.


to (8):
I think that's still missing... and actually I’d like to see this in the summary section .... so about this:
a) How does proxy_html decide to process a file? E.g. does it do so by the HTTP content type of a served file? Or does it try to detect itself whether the file is a HTML/XHTML file?
b) Which files does it process (closely related to (b))? Only HTML/XHTML? Or does it also process text/css files? Or separate script files?
c) I think the information already given at ProxyHTMLEnable, namely that only proxied content is processed, but that there is a way around this (PROXY_HTML_FORCE), should be given there, too.
d) And last but not least... that even in the extended mode of proxy_html (ProxyHTMLExtended) only _inline_ (inside a HTML/XHTML file) scripts/styles will be handled,... but not standalone CSS or script files.


Cheers,
Chris.
Comment 9 Christoph Anton Mitterer 2012-12-15 02:37:01 UTC
May I further add:

9) The note that it is „Available as a third-party module for earlier 2.x versions” is IMHO not necessary...
Anyone who ended up in the Apache >=2.4 docs will have it anyway as "official" module... anyone else... will use the pre-2.4 docs and not see it.


10) Looking at https://httpd.apache.org/docs/trunk/mod/mod_proxy_html.html, there seem to be some content encoding problems with the generated files, e.g. "Web�ing" or "Source�File:".


11) I think quite some information from the original upstream documentation got lost... including (but not limited to):
- many technical stuff from http://apache.webthing.com/mod_proxy_html/guide.html
- the trick with "RequestHeader unset Accept-Encoding"
Comment 10 Christoph Anton Mitterer 2012-12-15 23:21:40 UTC
12) The documentation of ProxyHTMLMeta misses what was contained in the original upstream documentation (http://apache.webthing.com/mod_proxy_html/guide.html#meta), namely that:
"it does not strip them from the HTML (except for Content-Type, which is removed in case it contains conflicting charset information)"

Note though, that this seems to be currently not fully working, see bug #54310.