Bug 4373 - sa-learn error parsing UTF-8 mail
Summary: sa-learn error parsing UTF-8 mail
Status: RESOLVED DUPLICATE of bug 4046
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.0.3
Hardware: PC Linux
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2005-05-30 05:58 UTC by John Madden
Modified: 2005-06-01 02:10 UTC (History)
0 users

Attachment Type Modified Status Actions Submitter/CLA Status
spam mail which causes error (mbox format) application/octet-stream None John Madden [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description John Madden 2005-05-30 05:58:08 UTC
sa-learn in 3.0 branch (3.0.3 and svn branches/3.0) throws the following error
on the mail (mbox format) to be attached.

john@headache:~/SA3.0$./sa-learn --mbox --spam ../mail/spam
Parsing of undecoded UTF-8 will give garbage when decoding entities at
lib/Mail/SpamAssassin/HTML.pm line 182.

This is fixed in trunk AFAICS (ie. the error above isn't thrown with current trunk).
Comment 1 John Madden 2005-05-30 05:58:45 UTC
Created attachment 2907 [details]
spam mail which causes error (mbox format)
Comment 2 Daryl C. W. O'Shea 2005-05-30 12:59:20 UTC
Subject: Re:  sa-learn error parsing UTF-8 mail

If I remember correctly, the warnings were just silenced in trunk as it 
was found that it made no difference in the results.

If you set $LANG to en_US or some other non-UTF value the warnings will 
probably disappear.

Comment 3 Matt Kettler 2005-06-01 10:05:43 UTC
Dupe of bug 4046? 
Comment 4 John Madden 2005-06-01 10:10:32 UTC

*** This bug has been marked as a duplicate of 4046 ***