Bug 5226 - FP from 80_additional on OE embedded images
Summary: FP from 80_additional on OE embedded images
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.1.7
Hardware: Other other
: P5 normal
Target Milestone: 3.1.11
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-07 07:39 UTC by Richard Frovarp
Modified: 2011-05-31 22:48 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Example of FP message text/plain None Richard Frovarp [NoCLA]
FP example text/plain None Juan Pablo Salazar Bert [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Frovarp 2006-12-07 07:39:39 UTC
I am seeing FPs due to several very high scoring rules firing on messages sent
from Outlook Express with embedded images, usually gifs. I have reduced the
scores of the rules and have helped eliminate the issue. Below is the report of
rules hit for the message. The INFO_TLD comes from the fact the message has a
signature that contains www.mailscaner.info in it.

score = 8.98
-1.44 ALL_TRUSTED
0.81 EXTRA_MPART_TYPE
0.00 HTML_MESSAGE
0.81 INFO_TLD
2.00 PART_CID_STOCK 
2.00 PART_CID_STOCK_LESS 
2.80 TVD_FW_GRAPHIC_ID1 
2.00 TVD_FW_MESG1
Comment 1 Richard Frovarp 2006-12-07 07:40:47 UTC
Created attachment 3771 [details]
Example of FP message

Example of FP message
Comment 2 Juan Pablo Salazar Bert 2006-12-19 09:42:15 UTC
Same here. Other people are also reporting this in the mailing list (reference:
http://www.nabble.com/HTML-Source-Rule-tf2728267.html#a7705726)
Comment 3 Justin Mason 2006-12-19 09:53:21 UTC
please provide as many FP samples as possible -- ensure they are valid RFC-2822
messages though (full headers, not munged by Outlook)
Comment 4 Theo Van Dinter 2006-12-19 10:43:17 UTC
fwiw, I'm not surprised by this so much.  the stock image spams were sent using
outlook, so rules targeting them tend to hit outlook mails w/ images in them.

-1.44 ALL_TRUSTED

not relevent.

0.81 EXTRA_MPART_TYPE

there's been discussion about this during rule development.  basically it hits
on any multipart/related mails, because the type= parameter is required.  it has
a score < 1 though, which I would read in as "people writing rules don't get a
lot of multipart/related mails, but it does FP, so the score was set low-ish".

0.00 HTML_MESSAGE

info rule.

0.81 INFO_TLD

this rule was killed.  .info is no longer a useful spam-sign.

2.00 PART_CID_STOCK 
2.00 PART_CID_STOCK_LESS 

These are in Justin's sandbox.  They look similar to my rules, so I'll just talk
about those below...

2.80 TVD_FW_GRAPHIC_ID1 
2.00 TVD_FW_MESG1

These are mine.  ID1 just looks at attachment content-ids, and has a great hit
rate in the nightly run:

 11.858  14.3436   0.0055    1.000   0.99    0.00  T_TVD_FW_GRAPHIC_ID1

but if like the above and these spams are made using outlook, then any other
similar mails will be hit.  I'll tone down the score.

MESG1 targetted some old spam, which according to the nightly results don't
occur anymore:

  0.005   0.0058   0.0000    1.000   0.49    0.00  T_TVD_FW_MESG1

so we should trash that rule.  Basically it looks for something that says it was
forwarded, and also has a GIF image attached.
Comment 5 Juan Pablo Salazar Bert 2006-12-19 12:33:42 UTC
Created attachment 3795 [details]
FP example

This is a false positive. Critical rules in spam report are:

2.00	PART_CID_STOCK
2.00	PART_CID_STOCK_LESS
2.80	TVD_FW_GRAPHIC_ID1
Comment 6 Justin Mason 2006-12-19 13:47:01 UTC
ah, looks like I forgot to push an update (the sooner we get off the
manually-run 3.1 updates, the better!)

PART_CID_STOCK and PART_CID_STOCK_LESS are additive.  fixed in the new updates...
Comment 7 Theo Van Dinter 2007-01-31 20:00:35 UTC
I lost track on this -- is everything done wrt the updates?  either way, punting
to 3.1.9 since it's more update than release.
Comment 8 peter gervai 2007-04-11 08:06:53 UTC
I got bumped into those several times, people were sending pics from outlook and
got bounced. It's like:

        Content analysis details:   (6.7 points, 5.0 required)
        pts rule name              description
        ---- ----------------------
--------------------------------------------------
        1.1 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type= entry
        -1.8 ALL_TRUSTED            Passed through trusted hosts only via SMTP
        1.2 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of words
        2.1 TVD_FW_GRAPHIC_ID1     BODY: TVD_FW_GRAPHIC_ID1
        0.3 HTML_COMMENT_SAVED_URL BODY: HTML message is a saved web page
        -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
        [score: 0.0000]
        1.5 HTML_MESSAGE           BODY: HTML included in message
        1.0 STOCK_IMG_OUTLOOK      Stock spam image part, with Outlook-like features
        1.0 PART_CID_STOCK         Has a spammy image attachment (by Content-ID)
        1.0 PART_CID_STOCK_LESS    Has a spammy image attachment (by Content-ID,
        more specific)
        1.0 STOCK_IMG_HTML         Stock spam image part, with distinctive HTML
        1.0 STOCK_IMG_HDR_FROM     Stock spam image part, with distinctive From line


HTML_MESSAGE raised by myself (catching otherwise close to zero spam), but
others were caused by the outlook crap and the image. 8.3 points just for
sending the picture seems to be a bit high.

[18298] dbg: dns: 7.1.3.updates.spamassassin.org => 507739, parsed as 507739
[18298] dbg: channel: current version is 507739, new version is 507739, skipping
channel
Comment 9 Theo Van Dinter 2007-07-08 00:34:59 UTC
It appears this is still an issue, but it's fixable via update and so doesn't
need to hold up 3.1.10.
Comment 10 peter gervai 2007-10-26 12:55:07 UTC
This MAY have been fixed. I believe TVD_FW_GRAPHIC_ID1 was phased out (into
T_TVD_FW_GRAPHIC_ID1 with a completely different scoring) and most of the tests
mentioned seem to have been changed. Hard to verify though.
Comment 11 Darxus 2011-05-31 22:41:27 UTC
I think this can be safely closed.


INFO_TLD is gone.

Reduced scores:
score PART_CID_STOCK 0.001 0.001 0.001 0.000 # n=2
score PART_CID_STOCK_LESS 0.000 0.036 0.745 0.894 # n=2

TVD_FW_GRAPHIC_ID1 became __TVD_FW_GRAPHIC_ID1.

TVD_FW_MESG1 is gone.

Perhaps most importantly, no comments in 3.6 years, on a false-positives bug.
Comment 12 Warren Togami 2011-05-31 22:48:08 UTC
closing