Bug 7657 - Certain messages get mangled by double UTF-8 encoding
Summary: Certain messages get mangled by double UTF-8 encoding
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.4.2
Hardware: PC Linux
: P2 blocker
Target Milestone: 3.4.3
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 7664 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-11-20 20:05 UTC by Ondřej Caletka
Modified: 2019-06-26 07:03 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Minimum working example message/rfc822 None Ondřej Caletka [NoCLA]
Mangled minimum working example message/rfc822 None Ondřej Caletka [NoCLA]
mwe.eml after passing through spamassassin text/plain None Ondřej Caletka [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Ondřej Caletka 2018-11-20 20:05:56 UTC
Created attachment 5628 [details]
Minimum working example

I use Debian 9 (Stretch) with spamassassin script run by procmail in the user's mailbox. Last week, spamassassin got updated from 3.4.1 to 3.4.2 and since then, some messages are mangled – the message body that was received UTF-8 encoded and transferred in 8bit mode is treated as ISO-8859-1 and reencoded to UTF-8 again, resulting in totally garbled accented characters.

I'm attaching a minimum working example. To reproduce the issue, one has to put 

add_header all Report _REPORT_

into ~/.spamassassin/user_prefs and call spamassassin < mwe.eml >mwe-mangled.eml

In the output file, the e-mail body will get double encoded, showing garbage instead of accented characters.

Disabling Report header insertion works around the issue.
Comment 1 Ondřej Caletka 2018-11-20 20:07:01 UTC
Created attachment 5629 [details]
Mangled minimum working example
Comment 2 Henrik Krohns 2019-06-13 14:12:28 UTC
Can't reproduce here. Even your "mangled" example attachment body is identical to your first attachment!?
Comment 3 Ondřej Caletka 2019-06-13 14:39:03 UTC
Created attachment 5660 [details]
mwe.eml after passing through spamassassin
Comment 4 Ondřej Caletka 2019-06-13 14:42:09 UTC
(In reply to Henrik Krohns from comment #2)
> Can't reproduce here. Even your "mangled" example attachment body is
> identical to your first attachment!?

Sorry, it seems that I uploaded the wrong file. The problem is still reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9 stretch.
Comment 5 Henrik Krohns 2019-06-14 06:43:26 UTC
(In reply to Ondřej Caletka from comment #4)
>
> Sorry, it seems that I uploaded the wrong file. The problem is still
> reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9
> stretch.

I have tried now on exact same fresh Debian 9, 3.4.2-1~deb9u1 installation, but no luck. Only change is the added headers.

$ diff mwe.eml mwe2.eml
1a2,8
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level: **
> X-Spam-Status: No, score=2.0 required=5.0 tests=FOO_1,FOO_2 autolearn=no
>       autolearn_force=no version=3.4.2
> X-Spam-Report:
>       *  1.0 FOO_1 BODY: No description available.
>       *  1.0 FOO_2 No description available.


Could you please list all settings you have changed from default?
Comment 6 Henrik Krohns 2019-06-14 06:48:39 UTC
Ah ok, I did manage to reproduce it. But I had to run network tests. Apparently the extra stuff in this report makes it change the body encoding, investigating..

$ diff mwe.eml mwe6.eml
1a2,18
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level:
> X-Spam-Status: No, score=0.6 required=5.0 tests=FOO_1,FOO_2,RCVD_IN_DNSWL_MED,
>       SPF_FAIL,SPF_HELO_NONE,TO_EQ_FM_DOM_SPF_FAIL autolearn=no
>       autolearn_force=no version=3.4.2
> X-Spam-Report:
>       * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/,
>       *       medium trust
>       *      [2001:718:1:1:0:0:144:199 listed in]
>       [list.dnswl.org]
>       *  0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
>       *  0.9 SPF_FAIL SPF: sender does not match SPF record (fail)
>       *      [SPF failed: Please see http://www.openspf.org/Why?s=mfrom;id=usera%40example.com;ip=2001%3A718%3A1%3A1%3A%3A144%3A199;r=super.palvel.in]
>       *  1.0 FOO_1 BODY: No description available.
>       *  1.0 FOO_2 No description available.
>       *  0.0 TO_EQ_FM_DOM_SPF_FAIL To domain == From domain and external SPF
>       *       failed
16c33                kÜ kůŠúpÄ Äábelské ódy.
---ÅíšernÄÅŸluÅ¥ouÄ
Comment 7 Henrik Krohns 2019-06-14 07:07:52 UTC
Bug 7305 is the culprit, specifically this part in Revision 1831073 spamassassin.raw:

   # OK, do checks and put out the message.
   my $status = $spamtest->check($mail);
-  print $status->rewrite_mail()  or die "error writing: $!";
+  { my $report = $status->rewrite_mail();
+    # encode Unicode characters to UTF-8 octets
+    utf8::encode($report) if utf8::is_utf8($report);
+    print $report  or die "error writing: $!";
+  }

Maybe Giovanni can chime in what this bit is intended to do. The original bug only talked about forcing C locale, but these utf8 encodings are other thing?
Comment 8 Henrik Krohns 2019-06-14 07:14:32 UTC
I commented out the utf8::encode and everything works fine. I get no perl warnings etc, so tt seems all the utf8::encodes should be reverted from that patch?
Comment 9 Giovanni Bechis 2019-06-14 07:58:33 UTC
That was it, fixed in r1861317.
 Thanks
Comment 10 Ondřej Caletka 2019-06-14 13:25:06 UTC
(In reply to Giovanni Bechis from comment #9)
> That was it, fixed in r1861317.
>  Thanks

Thank you very much!
Comment 11 Henrik Krohns 2019-06-26 07:03:09 UTC
*** Bug 7664 has been marked as a duplicate of this bug. ***