Bug 1470 - This message gets through unscathed, related to HTML::Parser call
Summary: This message gets through unscathed, related to HTML::Parser call
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 1417
Blocks:
  Show dependency tree
 
Reported: 2003-02-10 18:46 UTC by Theo Van Dinter
Modified: 2003-02-11 00:01 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Message causing the problems ... text/plain None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2003-02-10 18:46:13 UTC
Just received 3 spams to spamtraps, all of which were basically the same.
But the neat thing is that the message gets 0 points with set0 and only
gets hits in set1 with net tests.

The main thing is that they use obfuscating comments, multiple per word,
and the obfuscation uses the email address so (for me at least) gets
around the 12 char limit in our OBFUSCATING_COMMENT rule.

Perhaps we ought to up the length to 30 chars?

After rendering through lynx and getting text, set0 gets 3 points via:

CLICK_BELOW,LIMITED_TIME_ONLY,MONEY_BACK

After some debugging, all of the text is removed by the HTML::Parser
call in PerMsgStatus::get_decoded_stripped_body_text_array().

I think it's related to how we call HTML::Parser, not the external module itself (I did a quick run to remove HTML comments that worked fine).  The spamtrap-address-modified message will be attached shortly. :)
Comment 1 Theo Van Dinter 2003-02-10 18:47:08 UTC
Created attachment 645 [details]
Message causing the problems ...
Comment 2 Theo Van Dinter 2003-02-11 08:50:37 UTC
Some more info:Generic 2.50 run:.  0 1.0 HTML_90_100,MIME_HTML_ONLY,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If I take the following line from PerMsgStatus:    $hp->parse(pack ('C0', $text));and remove the pack part (so we just pass in $text), I get:.  2 1.0 CLICK_BELOW,HTML_80_90,HTML_COMMENT_EMAIL,HTML_MESSAGE,HTML_TAG_BALANCE_A,HTML_TAG_BALANCE_BODY,HTML_TAG_BALANCE_HTML,LIMITED_TIME_ONLY,MIME_HTML_ONLY,MONEY_BACK,__CLICK_BELOW,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If we need the pack (via bug 1417), we should use 'C0A*'.  Using just 'C0' passes no data to the parser ...
Comment 3 Theo Van Dinter 2003-02-11 08:56:34 UTC
Subject: Re: [SAdev]  This message gets through unscathed, related to HTML::Parser call

On Tue, Feb 11, 2003 at 08:50:37AM -0800, bugzilla-daemon@hughes-family.org wrote:
> Some more info:Generic 2.50 run:.  0 1.0 HTML_90_100,MIME_HTML_ONLY,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If I take the following line from PerMsgStatus:    $hp->parse(pack ('C0', $text));and remove the pack part (so we just pass in $text), I get:.  2 1.0 CLICK_BELOW,HTML_80_90,HTML_COMMENT_EMAIL,HTML_MESSAGE,HTML_TAG_BALANCE_A,HTML_TAG_BALANCE_BODY,HTML_TAG_BALANCE_HTML,LIMITED_TIME_ONLY,MIME_HTML_ONLY,MONEY_BACK,__CLICK_BELOW,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If we need the pack (via bug 1417), we should use 'C0A*'.  Using just 'C0' passes no data to the parser ...

Wow, WTF happened there?  1 big long line.

Anyway, the piece of the message that's important:

There's a line in PerMsgStatus for bug 1417:

$hp->parse(pack ('C0', $text));

Which is supposed to say "just use chars, not unicode", except that
by itself it nullifies the input string (nothing else matches in the
pattern...)  If we need to keep the pack, we should change it to 'C0A*'.
So I'm going to commit the change.  The bigger question is: ok, how many
messages does this affect, and does this affect the mass-check runs?

Comment 4 Theo Van Dinter 2003-02-11 09:01:21 UTC
Ok, this bug is now fixed.  I committed the change to 'C0A*' ...
Comment 5 Theo Van Dinter 2003-02-11 09:10:42 UTC
Subject: Re: [SAdev]  This message gets through unscathed, related to HTML::Parser call

On Tue, Feb 11, 2003 at 08:56:35AM -0800, bugzilla-daemon@hughes-family.org wrote:
> So I'm going to commit the change.  The bigger question is: ok, how many
> messages does this affect, and does this affect the mass-check runs?

I was worried this was in 2.50 for all the mass-check runs, but it turns
out we're lucky...  The broken version went in (rev 1.272) right after the
code got tagged for the second set of mass-check runs (rev 1.271).   Phew.

Comment 6 Antony Mawer 2003-02-11 09:17:45 UTC
Subject: Re: [SAdev]  This message gets through unscathed, related to HTML::Parser call 


bugzilla-daemon@hughes-family.org said:

> If we need the pack (via bug 1417), we should use 'C0A*'.  Using jus
> t 'C0' passes no data to the parser ...

argh, that was me.  Sorry about that.   Insufficient looking back through
CVS. :(

That went in *after* the GA run tagging, so that's not affected.

--j.