Bug 4876 - CRM114 Plugin for SpamAssassin (comments, please!)
Summary: CRM114 Plugin for SpamAssassin (comments, please!)
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.0.3
Hardware: Other Linux
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-24 13:29 UTC by Brian White
Modified: 2008-02-12 15:00 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
CRM114 interface plugin text/plain None Brian White [NoCLA]
CRM114 classification filter text/plain None Brian White [NoCLA]
CRM114 classification configuration text/plain None Brian White [NoCLA]
CRM114 rules scores, flags, etc. text/plain None Brian White [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Brian White 2006-04-24 13:29:07 UTC
Our SA implementation (via MailScanner and Exim) is working very well, dropping
some 90% of spam immediately and only missing (i.e. not tagging) some 5% of the
remaining "questionable" stuff.  Still, the amount of mail that gets tagged
(above a score of 4 and below 8) and passed to users is significant.

I wanted to try using CRM114 within SpamAssassin to augment the existing "bayes"
learner.  I wanted it to discriminate between spam/ham for messages that could
not be classified accurately by existing rules and, as such, not waste it's
resources for things already handled elsewhere.  I've written this plug-in to
test it out.

Right now, I'm just using the basic "classifymail.crm" script that comes with
CRM114 with a few modifications as to where to find files.

This is my first attempt at a plug-in and working with v3 of SpamAssassin, so I
appologize if it's not as elegant as it could be.

A few notes about the plugin:  .../SpamAssassin/Plugin/CRM114.pm

* It skips itself unless the current score is within the -5 to 15 range.  I
believe this will avoid running it for messages that are already obvious.  I
choose this range on the assumption that the rule weightings would never be more
than +/- 10 and thus would never be able to change the final decision on
messagse outside of that range.  I've set the rule priorities to run this rule last.

* I intended to train CRM only with messages that user supply as either false
positives or false negatives.  This contrants with the standard learningh system
that auto-learns from everything.  (I know I can disable auto-learn, but I want
CRM to work on a _different_ problem than the existing rules.)

* I still have to figure out how to actually do that training.  To train or
original messages would be a different data set than the "rendered" text it's
classifying.  What I need is a method to have SpamAssassin render a message and
dump it's output rather than running rules on it.


I'd appreciate any comments people have.  I've placed the plugin code in the
public domain.  The CRM filter file did not have a copyright notice on the
original; since it was an example, I suspect it's also public domain but can't
say for sure.  I am sure, however, that anybody with some CRM knowledge could
write a better classifier that what I present here.
Comment 1 Brian White 2006-04-24 13:31:25 UTC
Created attachment 3484 [details]
CRM114 interface plugin
Comment 2 Brian White 2006-04-24 13:31:57 UTC
Created attachment 3485 [details]
CRM114 classification filter
Comment 3 Brian White 2006-04-24 13:32:22 UTC
Created attachment 3486 [details]
CRM114 classification configuration
Comment 4 Brian White 2006-04-24 13:32:55 UTC
Created attachment 3487 [details]
CRM114 rules scores, flags, etc.
Comment 5 Bas Zoetekouw 2006-04-24 15:36:35 UTC
I've been using CRM114 with spamassassin for a while now (though in a simpler
setup, with CRM114 added a mail header, which is then matched against by SA). It
is working very well;  much better than SA's default internal bayesian engine.

In my experience, it's best to train only From and Subject headers, and the
body.  I'm also auto-training on messages that have low (<0) or very high (>12)
scores (note that crm114 will only learn messages that are misqualified).
Comment 6 Michael Parker 2006-04-29 13:58:09 UTC
Interesting.  Your best bet is to post this to the wiki:

http://wiki.apache.org/spamassassin/CustomPlugins

and also maybe an annoucement on the users list.

In the future I would like to pluginize Bayes as a whole, that would allow you
to write your plugin at a much lower level with more control, and then replace
the existing implementation if you believe it is better.
Comment 7 Justin Mason 2008-01-18 02:19:49 UTC
bug 5293 ("pluginize Bayes") is now about to be applied to 3.3.0 -- so it's now
possible to replace our default Bayes implementation with other classifiers
entirely, at a low level, as Michael mentioned.
Comment 8 Martin Sch 2008-02-12 15:00:26 UTC
If you are interested in CRM114 you might also try my CRM114 plugin, which I
wrote in 2007 without knowing this bugzilla entry.

It is available on the Spamassassin-Wiki
(http://wiki.apache.org/spamassassin/CustomPlugins) and at 
http://mschuette.name/wp/crm114-spamassassin-plugin/.