Bug 56287

Summary: mod_proxy_html deletes wrong data from HTML when meta http-equiv tag specifies Content-Type behind other meta http-equiv tag
Product: Apache httpd-2 Reporter: Micha Lenk <micha>
Component: mod_proxy_htmlAssignee: Apache HTTPD Bugs Mailing List <bugs>
Severity: normal Keywords: FixedInTrunk, PatchAvailable
Priority: P2    
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Fix offset in metafix code
Sample HTML code that breaks when processed by through mod_proxy_html

Description Micha Lenk 2014-03-19 21:01:59 UTC
Created attachment 31410 [details]
Fix offset in metafix code

mod_proxy_html deletes the wrong data from HTML code when a "http-equiv" meta tag specifies a Content-Type behind any other "http-equiv" meta tag. For better understanding of the issue, please consider the following HTML code (also attached as file metafix-breaker.html) treated by mod_proxy_html:

  <meta http-equiv="X-Dummy-Header" content="dummy value">
  <style type="text/css">div.ok { color: green; }       </style>
  <meta http-equiv="Content-Type" content="text/html; charset=utf8"      >
  <div class="ok">If the metafix is not broken, this text should get rendered in green color.</div>

Without the attached patch, mod_proxy_html will remove the <style> tag inside the <head> tag as soon as it parses the meta tag with the http-equiv="Content-Type" attribute. With the attached patch applied, mod_proxy_html removes the meta tag with the http-equiv="Content-Type" attribute instead. I guess this is what the code intended to do.

The attached patch is based on httpd trunk, rev. 1579365.
Comment 1 Micha Lenk 2014-03-19 21:03:12 UTC
Created attachment 31411 [details]
Sample HTML code that breaks when processed by through mod_proxy_html
Comment 2 Christophe JAILLET 2014-04-04 20:00:10 UTC
Thanks for the report.

I confirm your point and I have commited it as r1584878.
It can only be triggered if ProxyHTMLMeta is set, which is not the default.

However a few other issues puzzle me:

   - what if several "Content-Type" are found ? 
     With current code, only the latest will be taken into account

   - line 675:
          if (*p != '=')
    Really ? Shouldn't we break instead ?
    Continue will go back to line 671 and we will perform a p +=7 which could go past the end of the buffer

   - line 680:
     Shouldn't we also check for *q in case only one delimiter ('\'' or '"') is present. Without it, we could scan past the end of the buffer
Comment 3 Christophe JAILLET 2014-04-04 20:31:08 UTC
2nd and 3rd point are addressed in r1584884 and in r1584896.
Comment 4 Christophe JAILLET 2014-04-15 20:42:00 UTC
r1584878, r1584884 and r1584896 have been backported and will be part of 2.4.10.