Bug 62411

Summary: rewrite overruns PATH_INFO with Proxy, unixsocket uwsgi
Product: Apache httpd-2 Reporter: Jaxon <currentj>
Component: mod_rewriteAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 2.4.33   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: simple demo Python uwsgi app and .htaccess

Description Jaxon 2018-05-26 19:35:44 UTC
Created attachment 35949 [details]
simple demo Python uwsgi app and .htaccess

Possibly related to 62339 in regards to the "URL-escaping" possibly being the culprit?

If the domain name in a userdir .htaccess RewriteRule proxy address is anything other than 4 characters long it either underwrites or overwrites the path_info submitted to the underlying proxy. (Haven't tested behavior outside of a userdir)

Steps to reproduce
Started with a default fresh build

Enable additional mods:
proxy_module
proxy_uwsgi_module
userdir_module
rewrite_module

Enable additional configs:
Include conf/extra/httpd-userdir.conf
Include conf/extra/httpd-default.conf

Make a folder in a users public_html
Create a .htaccess file containing something like the following:
RewriteEngine on
RewriteRule "^(.*)$" "/unix:/path/to/folder/wsgiapp.socket|uwsgi://longappname/$1" [P,QSA,NE,DPI]

Problem is in Apache, con confirmed with strace of reads from the unix socket that the PATH_INFO environment variable is garbled pre-uwsgi application.

Dropping "NE" from the rewrite causes a 500 for unix socket (the "|" gets escaped I think, so required); and dropping "DPI" causes other issues.

With "DPI" flag:
If domain name longer than 4 characters consistently will add domain name characters after the 4th to the beginning of the path_info.
 In example RW from above: /~user/project/test -> appname/test
  script_name appears to loose 7 characters from it
If domain name less than 4 characters will remove the remaining number from the beginning of path_info (confirmed memory leak if domain+path is less than 4).
 example: 
  RW dest "/unix:socket|uwsgi://1/$1"
  URL request 1 /~user/project/012 -> path_info "2"
   script_name appears to get the etra 3 charecters added to it "/01"
  URL request 2 /~user/project/0 -> path_info "" or random leaked string (like wsgi.version data)
   script_name appears to loose some number of characters off the end

Without the "DPI" flag:
If hitting /~user/project/ path will be appname/index.html, and script name will have last 7 characters dropped (the length the domain is over 4) Unknown behavior if scriptname is less than domain length-4.
If hitting /~user/project/anything/* path will be "/*" and scriptname will have the "anything" part. (basically swallows the first /path/)

Attached is a simple example python uwsgi app and .htaccess file to see what's going on.

Instructions on use:
Extract into empty folder under a users public_html
cd into it
Create a .venv virtual environment with python ex: "python3 -m venv .venv"
Activate it ". .venv/bin/activate"
Install reqs "pip install -r requirements.txt"
Start wsgiapp "uwsgi wsgiapp.ini" (log will be in wsgiapp.log, to shut it down "echo q >wsgiapp.control")
Correct .htaccess "/unix:" pathe before the "|" to align with your absolute path.
Hit /~user/folder/works/... to see correct behavior (with 4 character domain name)
Hit /~user/folder/toolong/... to see when domain name is longer
Hit /~user/folder/tooshort/... to see when domain name is short

If you then strace the pid that uwsgi is running, looking at reads from wsgiapp.socket, you can reasonably read the protocol during requests to see the PATH_INFO is sent from apache wrong if you require confirmation it's *not* python/uwsgi etc just reading/displaying environ variables incorrectly.

Allegedly when using a unix socket proxy the domain name shouldn't matter except maybe for proxy workers... except is does, because it HAS to be 4 characters in this situation at least.