Bug 55981 - mod_mbox hang on some e-mails on user@openoffice.a.o - see INFRA-7171
Summary: mod_mbox hang on some e-mails on user@openoffice.a.o - see INFRA-7171
Status: REOPENED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_mbox (show other bugs)
Version: 2.5-HEAD
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-09 01:08 UTC by Sebb
Modified: 2014-01-12 16:30 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebb 2014-01-09 01:08:54 UTC
See https://issues.apache.org/jira/browse/INFRA-7171 for full details.

A Message-ID is allowed to contain the character "&".
For example, MS Outlook 12.0 can generate ids of the form:

Message-ID: <!&!A...>

The ajax processor stores the id in the response without encoding the ampersand.

For example:

<?xml version="1.0" encoding="UTF-8"?>
<mail id="%3c!&!A...%3e"> 

The XML processor then chokes on the "&!A" because it is not a valid entity reference.

It should be sufficient to encode "&" as "&amp;"
Comment 1 Sebb 2014-01-09 02:18:00 UTC
This is the code which fails when the XML is malformed:

http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/data/archives.js?view=markup#l408

And here I think is the code that needs to escape ampersands:

http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/module-2.0/mod_mbox_out.c?revision=1484915&view=markup#l1267

I could not find any docs on the APR method ap_escape_uri - which is what the macro URI_ESCAPE_OR_BLANK uses - so I don't know if that is supposed to escape ampersands or not.
Comment 2 Rainer Jung 2014-01-09 20:18:01 UTC
Sebb: there must be something more than just the missing "&" encoding. I added encoding for "&" in message IDs to the version running on aurora (mail-archives.eu.apache.org) but the request to retrieve the mail hangs just as when accessing the unchanged version on mail-archives.us. Unfortunately Firebug doesn't show the hanging request. On the server side I can see a new connection being opened, but then not http request being send.

Can you try again with your client against eu and see whether you find anything else the inhibits the processing?

Thanks!

Rainer
Comment 3 Rainer Jung 2014-01-09 20:26:02 UTC
Ampersand encoding in message IDs committed as r1556939.
Problem persists.
Comment 4 Sebb 2014-01-09 20:35:02 UTC
The thread starts at [1].

Now when I click Thread >>, I get


Not Found

The requested URL /mod_mbox/openoffice-users/201401.mbox/<!&;!AAAAAAAAAAAuAAAAAAAAAK00G0t+umtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAw8iQIDe+DSKAwYSmFc9yuAQAAAAA=@gmail.com> was not found on this server.

However on the US box I get the message OK at [2]

Looks like there is an extra ";" in the URL.

Fixing that might fix the other problem.


[1] http://mail-archives.eu.apache.org/mod_mbox/openoffice-users/201401.mbox/%3CCAH5WxYMV0Z4+V_D8HjkMGQSQwNYHnp=nc7CwWq_ZSJkDciosWA@mail.gmail.com%3E

[2] http://mail-archives.us.apache.org/mod_mbox/openoffice-users/201401.mbox/%3C!&!AAAAAAAAAAAuAAAAAAAAAK00G0t+umtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAw8iQIDe+DSKAwYSmFc9yuAQAAAAA=@gmail.com%3E
Comment 5 Sebb 2014-01-09 20:38:00 UTC
NOte, when I use JMeter to download the ajax link, I see:


<mail id="%3c!&amp%3b!AAAAAAAA

That does not look correct.
Comment 6 Sebb 2014-01-09 20:44:01 UTC
AFAIK, "&" is valid in quoted HTML attribute values, so it should only be necessary to replace in with "&amp;" in XML output.
Comment 7 Sebb 2014-01-09 20:46:28 UTC
(In reply to Sebb from comment #6)
> AFAIK, "&" is valid in quoted HTML attribute values, so it should only be
> necessary to replace in with "&amp;" in XML output.

Further, does it need to be done in XML CDATA sections?
Comment 8 Sebb 2014-01-09 20:55:56 UTC
(In reply to Sebb from comment #5)
> NOte, when I use JMeter to download the ajax link, I see:
> 
> 
> <mail id="%3c!&amp%3b!AAAAAAAA
> 
> That does not look correct.

The updated code seems to replace "&" with "&amp;" and then subject the result to uri encoding (whatever that is).

However, not all of the locations where the msgId is used are URIs.
The ajax response which was causing the problem is a quoted value in an XML document. Maybe it would be sufficient to just fix the value so it does not contain a bare ampersand? Are there any other values that need to be encoded?

Alternatively, fix the & after the URI encoding.

Also, I have just thought: URIs are allowed to contain &, so encoding those first may cause problems.

I think the encoding probably needs to be context-specific.

May I suggest starting with just fixing the ajax output in the msgID value?
Comment 9 Rainer Jung 2014-01-09 21:18:25 UTC
(In reply to Sebb from comment #8)
> (In reply to Sebb from comment #5)
> > NOte, when I use JMeter to download the ajax link, I see:
> > 
> > 
> > <mail id="%3c!&amp%3b!AAAAAAAA
> > 
> > That does not look correct.

I changed the ampersand encoding to happen after uri encoding. But for me the links look different than for you. Using curl:

Retrieving the URL

http://mail-archives.eu.apache.org/mod_mbox/openoffice-users/201401.mbox/ajax/thread?0

the broken message shows up as

 <message linked="1" depth="1" id="&lt;!&amp;!AAAAAAAAAAAuAAAAAAAAAK00G0t+umtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAw8iQIDe+DSKAwYSmFc9yuAQAAAAA=@gmail.com&gt;">
  <from><![CDATA[Think]]></from>
  <date><![CDATA[Mon, 06 Jan, 04:47]]></date>
  <subject><![CDATA[RE: Folks seeking help]]></subject>
 </message>

and the one working directly before is

 <message linked="1" depth="0" id="&lt;CAH5WxYMV0Z4+V_D8HjkMGQSQwNYHnp=nc7CwWq_ZSJkDciosWA@mail.gmail.com&gt;">
  <from><![CDATA[Timothy Wulf]]></from>
  <date><![CDATA[Mon, 06 Jan, 03:43]]></date>
  <subject><![CDATA[Folks seeking help]]></subject>
 </message>


> The updated code seems to replace "&" with "&amp;" and then subject the
> result to uri encoding (whatever that is).

I switched to first uri encoding, then ampersand.

> However, not all of the locations where the msgId is used are URIs.
> The ajax response which was causing the problem is a quoted value in an XML
> document. Maybe it would be sufficient to just fix the value so it does not
> contain a bare ampersand? Are there any other values that need to be encoded?

I could try, but the only other encoded chars in the example above are "<" and ">" which are also encoded in the working message ids.

> Alternatively, fix the & after the URI encoding.

Done.

> May I suggest starting with just fixing the ajax output in the msgID value?

What do you mean by this "ajax output in the msgID value"?

Regards,

Rainer
Comment 10 Sebb 2014-01-09 23:45:11 UTC
> What do you mean by this "ajax output in the msgID value"?

The ajax/xxx URL generates XML containing

<mail id="%3c!&!A...%3e">

on the original sofware. That is the only bit that does not seem to work, so I suggest just fixing that piece of code, i.e. as originally quoted:

http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/module-2.0/mod_mbox_out.c?revision=1484915&view=markup#l1267

The currently updated code has got lots of additional changes; it's not clear to me that they are necessary - and some may be incorrect. Even if they do turn out to be required, it makes sense to try and fix one problem at a time.

The EU server now works OK for me using Chrome (limited testing), but still hangs in Firefox.

I get output starting

<mail id="%3c!%26!AAAAAAAAAA

when I access the URL

http://mail-archives.eu.apache.org/mod_mbox/openoffice-users/201401.mbox/ajax/%3C!&!AAAAAAAAAAAuAAAAAAAAAK00G0t+umtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAw8iQIDe+DSKAwYSmFc9yuAQAAAAA=@gmail.com%3E

This is the one generated from Thread >> as explained previously


For the thread?0 URL you used, I get the same result on both EU and US, so clearly the patch has not changed that. But AFAICT that is not the problem area.
The original problem output is as stated in comment 1
Comment 11 Sebb 2014-01-11 22:46:13 UTC
Not sure if you have made a further change to the EU server, but it seems to be working OK for me now in both Firefox and Chrome. Or perhaps there was a caching problem previously with my browser?
Comment 12 Rainer Jung 2014-01-12 13:50:32 UTC
Hi Sebb, thanks for your feedback and tests.

The last change I made to the eu mail-archives were on January 9th, 10:51 p.m. UTC. I commited that final change right now as r1557528 to svn.

I tried to debug the root cause as well, but also got problems with browser the cache. The javascript engine complained about non-well formed ajax reponses (the same URL that you provided in Comment #10) because of an ampersand in the id attribute. But when I retrieved the same content with a simple commandline client it passed the tests for well-formedness and there was the %26 instead of the ampersand.

I copied the latest variant of mod_mbox to the us mail-archives now as well (and restarted it).

I will close this now, feel free to reopen if you stumble over the same problem again.
Comment 13 Sebb 2014-01-12 13:58:57 UTC
One thought did occur to me last night - the existing encoding uses hex (%xx), so it would  have been better to use the same for the "&", rather than using "&amp;". My bad there.

I'm still unsure that all the changes were necessary, but I guess that will be discovered in due course.
Comment 14 Sebb 2014-01-12 15:45:05 UTC
Unfortunately just discovered another hang in the same mailbox.

Start with:

http://mail-archives.apache.org/mod_mbox/openoffice-users/201401.mbox/browser

Find the  first email from Carmen Putrino, and click on that.

Then click Next - works OK

Click Next again - hangs.

It looks as though the e-mail being requested cannot be found [1]

However in the thread list one can click on the message from "Think" [2]

It looks like the failing URL has been encoded too much (or is not being decoded correctly).

I don't have time to look at this further just now.

[Perhaps we could discuss on infra-dev how to set up a local test server? There are probably other encoding issues.]

[1] http://mail-archives.eu.apache.org/mod_mbox/openoffice-users/201401.mbox/ajax/%3C!%26!AAAAAAAAAAAuAAAAAAAAAK00G0t%2BumtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAMeR3X77CdSIA6a%2FHOYp40AQAAAAA%3D%40gmail.com%3E


[2] http://mail-archives.eu.apache.org/mod_mbox/openoffice-users/201401.mbox/%3c!%26!AAAAAAAAAAAuAAAAAAAAAK00G0t+umtIpgYl2WFMbwIBAGAfOzqGVtxCl8C3TALrx8YAAAAABWoAABAAAAAMeR3X77CdSIA6a/HOYp40AQAAAAA=@gmail.com%3e
Comment 15 Rainer Jung 2014-01-12 16:30:46 UTC
Yes there are probably more ones.

This one is due to the client sending a %2F for a slash. By default %2F is disabled for security reasons. To make it work, we would either need to allow %2F in the mail-archives vhost on the proxy in front of mail-archives - which I have just done for eu - or change the JavaScript file distributed to not switch from "/" to %2F in message id encoding. There's two calls to encodeURIComponent() in archives.js, one of which likely is responsible. At least I didn't see the %2F in the message ids contained in responses the server itself sends.