SA Bugzilla – Bug 5226
FP from 80_additional on OE embedded images
Last modified: 2011-05-31 22:48:08 UTC
I am seeing FPs due to several very high scoring rules firing on messages sent from Outlook Express with embedded images, usually gifs. I have reduced the scores of the rules and have helped eliminate the issue. Below is the report of rules hit for the message. The INFO_TLD comes from the fact the message has a signature that contains www.mailscaner.info in it. score = 8.98 -1.44 ALL_TRUSTED 0.81 EXTRA_MPART_TYPE 0.00 HTML_MESSAGE 0.81 INFO_TLD 2.00 PART_CID_STOCK 2.00 PART_CID_STOCK_LESS 2.80 TVD_FW_GRAPHIC_ID1 2.00 TVD_FW_MESG1
Created attachment 3771 [details] Example of FP message Example of FP message
Same here. Other people are also reporting this in the mailing list (reference: http://www.nabble.com/HTML-Source-Rule-tf2728267.html#a7705726)
please provide as many FP samples as possible -- ensure they are valid RFC-2822 messages though (full headers, not munged by Outlook)
fwiw, I'm not surprised by this so much. the stock image spams were sent using outlook, so rules targeting them tend to hit outlook mails w/ images in them. -1.44 ALL_TRUSTED not relevent. 0.81 EXTRA_MPART_TYPE there's been discussion about this during rule development. basically it hits on any multipart/related mails, because the type= parameter is required. it has a score < 1 though, which I would read in as "people writing rules don't get a lot of multipart/related mails, but it does FP, so the score was set low-ish". 0.00 HTML_MESSAGE info rule. 0.81 INFO_TLD this rule was killed. .info is no longer a useful spam-sign. 2.00 PART_CID_STOCK 2.00 PART_CID_STOCK_LESS These are in Justin's sandbox. They look similar to my rules, so I'll just talk about those below... 2.80 TVD_FW_GRAPHIC_ID1 2.00 TVD_FW_MESG1 These are mine. ID1 just looks at attachment content-ids, and has a great hit rate in the nightly run: 11.858 14.3436 0.0055 1.000 0.99 0.00 T_TVD_FW_GRAPHIC_ID1 but if like the above and these spams are made using outlook, then any other similar mails will be hit. I'll tone down the score. MESG1 targetted some old spam, which according to the nightly results don't occur anymore: 0.005 0.0058 0.0000 1.000 0.49 0.00 T_TVD_FW_MESG1 so we should trash that rule. Basically it looks for something that says it was forwarded, and also has a GIF image attached.
Created attachment 3795 [details] FP example This is a false positive. Critical rules in spam report are: 2.00 PART_CID_STOCK 2.00 PART_CID_STOCK_LESS 2.80 TVD_FW_GRAPHIC_ID1
ah, looks like I forgot to push an update (the sooner we get off the manually-run 3.1 updates, the better!) PART_CID_STOCK and PART_CID_STOCK_LESS are additive. fixed in the new updates...
I lost track on this -- is everything done wrt the updates? either way, punting to 3.1.9 since it's more update than release.
I got bumped into those several times, people were sending pics from outlook and got bounced. It's like: Content analysis details: (6.7 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP 1.2 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of words 2.1 TVD_FW_GRAPHIC_ID1 BODY: TVD_FW_GRAPHIC_ID1 0.3 HTML_COMMENT_SAVED_URL BODY: HTML message is a saved web page -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] 1.5 HTML_MESSAGE BODY: HTML included in message 1.0 STOCK_IMG_OUTLOOK Stock spam image part, with Outlook-like features 1.0 PART_CID_STOCK Has a spammy image attachment (by Content-ID) 1.0 PART_CID_STOCK_LESS Has a spammy image attachment (by Content-ID, more specific) 1.0 STOCK_IMG_HTML Stock spam image part, with distinctive HTML 1.0 STOCK_IMG_HDR_FROM Stock spam image part, with distinctive From line HTML_MESSAGE raised by myself (catching otherwise close to zero spam), but others were caused by the outlook crap and the image. 8.3 points just for sending the picture seems to be a bit high. [18298] dbg: dns: 7.1.3.updates.spamassassin.org => 507739, parsed as 507739 [18298] dbg: channel: current version is 507739, new version is 507739, skipping channel
It appears this is still an issue, but it's fixable via update and so doesn't need to hold up 3.1.10.
This MAY have been fixed. I believe TVD_FW_GRAPHIC_ID1 was phased out (into T_TVD_FW_GRAPHIC_ID1 with a completely different scoring) and most of the tests mentioned seem to have been changed. Hard to verify though.
I think this can be safely closed. INFO_TLD is gone. Reduced scores: score PART_CID_STOCK 0.001 0.001 0.001 0.000 # n=2 score PART_CID_STOCK_LESS 0.000 0.036 0.745 0.894 # n=2 TVD_FW_GRAPHIC_ID1 became __TVD_FW_GRAPHIC_ID1. TVD_FW_MESG1 is gone. Perhaps most importantly, no comments in 3.6 years, on a false-positives bug.
closing