Bug 35100 - URL-parsing does not work for www.altavista.com
Summary: URL-parsing does not work for www.altavista.com
Status: RESOLVED DUPLICATE of bug 42592
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.0.54
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-27 10:59 UTC by Bjoern Voigt
Modified: 2007-10-08 16:23 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bjoern Voigt 2005-05-27 10:59:20 UTC
It's not possible to use the relatively popular search engine

   http://www.altavista.com/

with apache2's mod_proxy* modules.

You can easily see the problem, if you

  a) type a search word into the search field in 
     http://www.altavista.com/

  b) click on of the links in this page

The main problem is, that apache-mod_proxy does some URL re-encodings. After
this re-encodings the original URL path component differs from the encoded form.

An example. There is an example link from http://de.altavista.com/ (I
changed it a little bit, because I do not know, if the URL contains
private infos)
  
http://av.rds.yahoo.com/_ylt=A9ibyDZZCEq4AklmSLaMX;_ylu=X3oDBvNjNnZmYzBHBndANhdl93ZWJfaG9tZQRzZWMDdGFicw--/SIG=11nr22kc/EXP=111216420/**http%3a//de.altavista.com/dir/default

apache-mod_proxy transforms it to (sniffed with ethereal):

   GET
/_ylt=A9ibyDZZCEq4AklmSLaMX;_ylu=X3oDBvNjNnZmYzBHBndANhdl93ZWJfaG9tZQRzZWMDdGFicw--/SIG=11nr22kc/EXP=111216420/**http://de.altavista.com/dir/default
   HTTP/1.1

Do you see the difference? "http%3a//" is transformed to "http://". 

The offline browser wwwoffle has the same problem. I wrote a patch for wwwoffle,
which makes saves "%3a" in URL pathes, instead of rewriting it to the colon (":"). 

I'm not familiar with apache2's mod_proxy* code. But probably the idea of saving
"%3a" also helps to fix the problem in apache2.
Comment 1 Bjoern Voigt 2005-05-27 13:42:28 UTC
Bug #29554 reports the similar problem for Apache 1.3. #29554 also includes a
patch for Apache 1.3.
Comment 2 Paul Querna 2005-12-06 07:45:36 UTC
Can anyone test this with the new proxy code in 2.2.0?
Comment 3 Bjoern Voigt 2005-12-07 11:56:47 UTC
Unfortunately the problem still persists in Apache 2.2.0. I tested the proxy
code with Apache 2.2.0 on Linux and with a search on http://www.altavista.com/.
Comment 4 Nick Kew 2007-09-09 15:38:08 UTC
This is a manifestation of PR 41798.  Working on it (at last).

*** This bug has been marked as a duplicate of 41798 ***
Comment 5 Nick Kew 2007-10-08 16:23:06 UTC
This is not in fact a duplicate of PR 41798.  Although the symptoms are the
same, the code path and therefore the fix are entirely different.
Comment 6 Nick Kew 2007-10-08 16:23:32 UTC

*** This bug has been marked as a duplicate of 42592 ***