SA Bugzilla – Bug 7657
Certain messages get mangled by double UTF-8 encoding
Last modified: 2019-06-26 07:03:09 UTC
Created attachment 5628 [details] Minimum working example I use Debian 9 (Stretch) with spamassassin script run by procmail in the user's mailbox. Last week, spamassassin got updated from 3.4.1 to 3.4.2 and since then, some messages are mangled – the message body that was received UTF-8 encoded and transferred in 8bit mode is treated as ISO-8859-1 and reencoded to UTF-8 again, resulting in totally garbled accented characters. I'm attaching a minimum working example. To reproduce the issue, one has to put add_header all Report _REPORT_ into ~/.spamassassin/user_prefs and call spamassassin < mwe.eml >mwe-mangled.eml In the output file, the e-mail body will get double encoded, showing garbage instead of accented characters. Disabling Report header insertion works around the issue.
Created attachment 5629 [details] Mangled minimum working example
Can't reproduce here. Even your "mangled" example attachment body is identical to your first attachment!?
Created attachment 5660 [details] mwe.eml after passing through spamassassin
(In reply to Henrik Krohns from comment #2) > Can't reproduce here. Even your "mangled" example attachment body is > identical to your first attachment!? Sorry, it seems that I uploaded the wrong file. The problem is still reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9 stretch.
(In reply to Ondřej Caletka from comment #4) > > Sorry, it seems that I uploaded the wrong file. The problem is still > reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9 > stretch. I have tried now on exact same fresh Debian 9, 3.4.2-1~deb9u1 installation, but no luck. Only change is the added headers. $ diff mwe.eml mwe2.eml 1a2,8 > X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx > X-Spam-Level: ** > X-Spam-Status: No, score=2.0 required=5.0 tests=FOO_1,FOO_2 autolearn=no > autolearn_force=no version=3.4.2 > X-Spam-Report: > * 1.0 FOO_1 BODY: No description available. > * 1.0 FOO_2 No description available. Could you please list all settings you have changed from default?
Ah ok, I did manage to reproduce it. But I had to run network tests. Apparently the extra stuff in this report makes it change the body encoding, investigating.. $ diff mwe.eml mwe6.eml 1a2,18 > X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx > X-Spam-Level: > X-Spam-Status: No, score=0.6 required=5.0 tests=FOO_1,FOO_2,RCVD_IN_DNSWL_MED, > SPF_FAIL,SPF_HELO_NONE,TO_EQ_FM_DOM_SPF_FAIL autolearn=no > autolearn_force=no version=3.4.2 > X-Spam-Report: > * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, > * medium trust > * [2001:718:1:1:0:0:144:199 listed in] > [list.dnswl.org] > * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record > * 0.9 SPF_FAIL SPF: sender does not match SPF record (fail) > * [SPF failed: Please see http://www.openspf.org/Why?s=mfrom;id=usera%40example.com;ip=2001%3A718%3A1%3A1%3A%3A144%3A199;r=super.palvel.in] > * 1.0 FOO_1 BODY: No description available. > * 1.0 FOO_2 No description available. > * 0.0 TO_EQ_FM_DOM_SPF_FAIL To domain == From domain and external SPF > * failed 16c33 kÜ kůŠúpÄ Äábelské ódy. ---ÅÃÅ¡ernÄÅŸluÅ¥ouÄ
Bug 7305 is the culprit, specifically this part in Revision 1831073 spamassassin.raw: # OK, do checks and put out the message. my $status = $spamtest->check($mail); - print $status->rewrite_mail() or die "error writing: $!"; + { my $report = $status->rewrite_mail(); + # encode Unicode characters to UTF-8 octets + utf8::encode($report) if utf8::is_utf8($report); + print $report or die "error writing: $!"; + } Maybe Giovanni can chime in what this bit is intended to do. The original bug only talked about forcing C locale, but these utf8 encodings are other thing?
I commented out the utf8::encode and everything works fine. I get no perl warnings etc, so tt seems all the utf8::encodes should be reverted from that patch?
That was it, fixed in r1861317. Thanks
(In reply to Giovanni Bechis from comment #9) > That was it, fixed in r1861317. > Thanks Thank you very much!
*** Bug 7664 has been marked as a duplicate of this bug. ***