SA Bugzilla – Bug 7253
X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047
Last modified: 2016-06-23 16:39:55 UTC
With report_safe 0 and a rule with non-ASCII description, e.g.: header L_TEST_REPORT_ENCODING From =~ /./ score L_TEST_REPORT_ENCODING 0.01 describe L_TEST_REPORT_ENCODING En-tête contient caractères the resulting X-Spam-Report multiline header field as inserted into spam messages is incorrectly encoded into encoded-words: the whole multiline header field is encoded into a single encoded-words, whitespace is not encoded, the result contains whitespace within encoded-word, and the encoded-word spans across lines: X-Spam-Report: =?UTF-8?Q? * 100 USER_IN_BLACKLIST From: address is in the user's black-list * 0.0 L_TEST_REPORT_ENCODING En-t=c3=aate contient caract=c3=a8res * -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5% * [score: 0.0137] * 0.3 TXREP TXREP: Score normalizing based on sender's reputation?= This is wrong on multiple accounts. The RFC 2047 is explicit: An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. [...] IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's by an RFC 822 parser. As a consequence, unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an 'encoded-word'. For example, the character sequence =?iso-8859-1?q?this is some text?= would be parsed as four 'atom's, rather than as a single 'atom' (by an RFC 822 parser) or 'encoded-word' (by a parser which understands 'encoded-words'). The correct way to encode the string "this is some text" is to encode the SPACE characters as well, e.g. =?iso-8859-1?q?this=20is=20some=20text?= [...] Only a subset of the printable ASCII characters may be used in 'encoded-text'. Space and tab characters are not allowed, so that the beginning and end of an 'encoded-word' are obvious. The culprit is MS::PerMsgStatus::qp_encode_header(). It should encode (when necessary) each line individually, and should encode whitespace within encoded-word(s).
> The culprit is MS::PerMsgStatus::qp_encode_header(). > It should encode (when necessary) each line individually, > and should encode whitespace within encoded-word(s). Tried to use Encode (latest version: 2.72) in qp_encode_header() to do the encoding: $text = Encode::encode('MIME-Header', $text); Although the Encode::MIME::Header is much more careful than our code, it is still wrong, resulting in an empty line inserted in the header!!! X-Spam-Report: * 100 USER_IN_BLACKLIST From: address is in the user's black-list =?UTF-8?B?CSogIDAuMCBMX1RFU1RfUkVQT1JUX0VOQ09ESU5HIEVuLXTDg8KqdGUgY29udGllbg==?= =?UTF-8?B?dCBjYXJhY3TDg8KocmVz?= * -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5% * [score: 0.0137] * 0.2 TXREP TXREP: Score normalizing based on sender's reputation Bad, bad, bad, (and unsightly).
> Tried to use Encode (latest version: 2.72) [...] > Although the Encode::MIME::Header is much more careful than > our code, it is still wrong, resulting in an empty line > inserted in the header!!! Tried Encode 2.78 (the latest on CPAN), it is still broken on multiple accounts: - still inserts an empty line in a header on encoding: $ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-B", "report\n\tL_TEST_REPORT_ENCODING En-tête contient caractères\n\tend-report"))' <quote> Subject: report =?UTF-8?B?CUxfVEVTVF9SRVBPUlRfRU5DT0RJTkcgRW4tdMOqdGUgY29udGllbnQgY2FyYWM=?= =?UTF-8?B?dMOocmVz?= end-report </quote> - produces a continuation line with no leading whitespace on encoding: $ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-Q", "report\n\tL_TEST_REPORT_ENCODING En-tête contient caractères\n\tend-report"))' <quote> Subject: report =?UTF-8?Q?=09L=5FTEST=5FREPORT=5FENCODI?= =?UTF-8?Q?NG=20En=2Dt=C3=AAte=20contient=20?= =?UTF-8?Q?caract=C3=A8res?= end-report </quote> - loses whitespace on folding when decodin https://rt.cpan.org/Public/Bug/Display.html?id=40027 $ perl -le 'use Encode; print decode("MIME-Header", "a: b\r\n c")' a: bc
For reference: > - Encode::MIME::Header inserts an empty line in a header on encoding reported: https://rt.cpan.org/Ticket/Display.html?id=107775 > - produces a continuation line with no leading whitespace on encoding reported: https://rt.cpan.org/Ticket/Display.html?id=107776 > - loses whitespace on folding when decoding > https://rt.cpan.org/Public/Bug/Display.html?id=40027
It's funny (if not disheartening) that we have two tests which are there specifically to test encoding of an X-Spam-Report in a header, and they both expect an incorrectly encoded result. When the bug is fixed these tests fail :) t/reportheader_8bit.t t/prefs_include.t
Created attachment 5332 [details] proposed patch - encoded-words must not cross line boundaries - encoded-words must not contain non-encoded whitespace - choose between Q and B encoding based on shorter encoded length (and rename sub qp_encode_header to mime_encode_header) - change a call to MIME::Base64::encode_base64 in oder to prevent it from inserting a newline in the middle of a longish encoded-word, add a test for this - adjust t/prefs_include.t and t/reportheader_8bit.t to cope with this fix - enhanced the test t/header_utf8.t trunk: Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Sending lib/Mail/SpamAssassin/Util.pm Sending t/header_utf8.t Sending t/prefs_include.t Sending t/reportheader_8bit.t Committed revision 1708863. Btw, the mime_encode_header() is still not perfect, as it is incapable of breaking long encoded-words if line length exceeds 75 characters. Still, it is a major improvement over what we had and over what Encode::MIME::Header would offer.
Should do for now, it's a significant improvement anyway. Closing.