|
SA Bugzilla – Full Text Bug Listing |
Summary: | RFE: add learning support to spamd/spamc | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Daniel Quinlan <quinlan> |
Component: | spamassassin | Assignee: | Michael Parker <parkerm> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dev, fede2, groups, james, parkerm, roy |
Priority: | P4 | ||
Version: | SVN Trunk (Latest Devel Version) | ||
Target Milestone: | 3.1.0 | ||
Hardware: | Other | ||
OS: | other | ||
Whiteboard: | Development Started | ||
Attachments: |
Unfinished Patch
SpamC-part + SpamD-part(Michael) Finished SpamC-part + SpamD-part (Michael) New spamc part Patch to fix Binary Compatibility |
Description
Daniel Quinlan
2002-11-11 23:25:03 UTC
assigning to me Outgoing non-spam is quite different from incoming non-spam, and we probably can't train on one to get the other. Regardless, setting 3.0 I wrote on SAdev:
> I propose adding a new command to the spamd/spamc protocol:
>
> Base protocol: spamd/README.spamd
> Reference bug: http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1201
>
> ------- start of cut text --------------
> New command:
>
> LEARN <classification> -- Learn the following message for Bayes
>
> The <classification> is either "HAM" or "SPAM".
> ------- end ----------------------------
I think this covers all of the cases mentioned in the bug report just as
well as any other API would.
Assigning back to group.
This is now the third time i've hit 'escape' in an IE window, due to my bad vi habits and destroyed the last ten minutes of text. This now continues in notepad. Grr. Argh. A few questions regarding this: 1) does the learn facility ( $spamtest->learn($mail, $msgid, $isspam, $forget) train AWL as well as bayes? 2) Is there a way to train and report within only calling the NoMailAudit module (Or do i completely miss the point of the NoMailAudit module -- Fuzzy there) 3) do you need to specifically forget and re-learn a message when training FP's/FN's, IE: do I need to do two calls to learn -- $spamtest->learn($mail, $msgid, 0, 1) $spamtest->learn($mail, $msgid, $isspam, 0) 4) When i attempted this the first time around -- I noticed something a bit strange. If I trained a message as one type, and then attempted to simply call learn as the other type, I got a slew of errors. I can repost them, but I have a feeling I am going about this wrong. *** Bug 1650 has been marked as a duplicate of this bug. *** Bug #2167 may be of interest. We have code that significantly more developed than the current patches in BZ and it could be used to accomplish what you are looking for. In short, if you wanted to do this (regardless if it's a good idea or not) it'd be pretty simple to write a server in perl that wrapped sa-learn in a simple network API that you could connect to with a simple custom client. I really think we want learning to be an accepted command in the spamd protocol, a separate daemon is superfluous. agreed with Dan I doubt this'll be ready for 3.0... punting to 3.1. I don't think we are ready to be reassigning tickets to 3.1 because they don't meet a specific timeframe that is only a proposal. Hello, how is the current state of this bug (feature request)? Will this feature introduced into 3.0 or in the later 3.1 release? For me this is a long expected feature to make spamc/spamd really useable for large user-based spamc/spamd environments. Thanks! I don't think this is going to happen for 3.0.0 -- we have no code to implement it, and there's several other high-priority issues to fix. AFAIK, everyone agrees it'd be a good feature (right devs?), but nobody's got the bandwidth to do it. :( Subject: Re: RFE: add learning support to spamd/spamc On Mon, May 03, 2004 at 11:52:19AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > AFAIK, everyone agrees it'd be a good feature (right devs?), but nobody's got > the bandwidth to do it. :( I'm +1 on the feature. I started looking into it but decided to wait til after the spamd work Theo did was done, but its pretty far down my list at the moment. I don't see it happening before 3.0.0 unless we delay release. Subject: Re: RFE: add learning support to spamd/spamc On Mon, May 03, 2004 at 11:56:23AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > http://bugzilla.spamassassin.org/show_bug.cgi?id=1201 > > > > > > ------- Additional Comments From parkerm@pobox.com 2004-05-03 11:56 ------- > Subject: Re: RFE: add learning support to spamd/spamc > > On Mon, May 03, 2004 at 11:52:19AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > > AFAIK, everyone agrees it'd be a good feature (right devs?), but nobody's got > > the bandwidth to do it. :( This would be a great feature. *** Bug 3362 has been marked as a duplicate of this bug. *** *** Bug 3363 has been marked as a duplicate of this bug. *** I've started working on this, and have most of it done. A few questions/comments, if anyone sees a problem with assumptions of design decisions feel free to speak up. 1) Does adding a new command mean that we need to bump the protocol version? 2) My thinking is to add a LEARN command to the protocol and then a Learn-Type header that can contain ham/spam/forget (more likely some numerical value that represents each of them). 3) You can call spamc with -L <type> (ie spamc -L ham < hammsg.txt, spamc -L spam < spammsg.txt or spamc -L forget < somemsg.txt). -L == learner, but I'm open to better options. 4) What should the LEARN command return to spamc to indicate success/failure/already learned/etc? Subject: Re: RFE: add learning support to spamd/spamc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >1) Does adding a new command mean that we need to bump the protocol version? nope IMO. it should be possible to detect the "protocol error" returned when a new verb is attempted against an older spamd, and fail with an error. >2) My thinking is to add a LEARN command to the protocol and then a Learn-Type >header that can contain ham/spam/forget (more likely some numerical value that >represents each of them). +1 >3) You can call spamc with -L <type> (ie spamc -L ham < hammsg.txt, spamc -L >spam < spammsg.txt or spamc -L forget < somemsg.txt). -L == learner, but I'm >open to better options. hmm. what about emulation of sa-learn somehow? I suppose we'd need long options... >4) What should the LEARN command return to spamc to indicate >success/failure/already learned/etc? Header: value pairs, I'd suggest. more extensible. Also use the HTTP-style status line and status code. - --j. >------- You are receiving this mail because: ------- >You are the assignee for the bug, or are watching the assignee. >You are on the CC list for the bug, or are watching someone who is. > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAyicIQTcbUG5Y7woRAqejAKDLZvEoIpuV9XPmBDHLQmZu8a5kDQCg2CV3 S/n4uQddWLTULE6tlI81Y0M= =AhxA -----END PGP SIGNATURE----- >>1) Does adding a new command mean that we need to bump the protocol version? >nope IMO. it should be possible to detect the "protocol error" returned >when a new verb is attempted against an older spamd, and fail with an >error. +1 >>2) My thinking is to add a LEARN command to the protocol and then a Learn-Type >>header that can contain ham/spam/forget (more likely some numerical value that >>represents each of them). >+1 +1 If you need some testing, let me know how i can help! Greetings NicoP. more accuracy and performance bugs going to 3.1.0 milestone As Michael Parker wrote on 2004-06-11 09:15:
>I've started working on this, and have most of it done. A few
>questions/comments, if anyone sees a problem with assumptions of design
>decisions feel free to speak up.
I really don't want to bother about this feature request. But as Michael Parker
stated, it's already near complete.
My intention to this, is i am actually in programming a complete
SpamAssassin-integrated suite into Domino. To complete the project, i'll need to
enable the user-based learn into bayes. Because i want to smartly integrate
SpamAssassin into Domino i have to use only spamc.
Would it be possible to hand over the current sources. I want to give me a try
on myself to get the development process continue!
Best regards.
Hello Michael, now as plain text. You stated that the development of bug# 1201 started. Because i'am very interested getting this feature i want to talk about the realization. Now i already started the development of this feature by my own, as stated here before: http://bugzilla.spamassassin.org/show_bug.cgi?id=1201#c21 Pherhaps there are some intersections between my dev to your road map. In my DSCLearner (DomSpamC Learner module => SpamC part for Lotus Domino) i've planned to use the "int flags" variable to submit the LEARN command and i say "learn option". "learn option" specifies that the message will be learned or unlearned. So the protocol could be for learn a spam message "LEARN SPAM SPAMC/1.3" Pros for this would be that only a small change to the libspamc is needed. Would a protocoll version change needed? A code example of initial spamc part: g_DomSpamCConfiguration.flags |= SPAMC_LEARN | LEARN_SPAM; if ((ret = transport_setup(&g_DomSpamCConfiguration.trans, g_DomSpamCConfiguration.flags)) == EX_OK ) { if (ret == EX_OK) { ret = message_filter(&g_DomSpamCConfiguration.trans, tempSendToUserName, g_DomSpamCConfiguration.flags, m); sprintf( strTemp, "Return-Code (ret) from SpamD: %i", ret ); WriteToNotesLog( strTemp, 0 ); } } If you're interested i'll take the spamc-part!? Best regards. NicoP. Created attachment 2550 [details]
Unfinished Patch
Here is what I have of the code at the moment. It is not yet completed. I
think I've run at least a test on the code with the perl base client, but
honestly don't remember for sure.
A few notes to help guide anyone who wants to pick this up:
1) There is far too much code duplication between the check and learn
functions. I started trying to extract out what I could.
2) The PROTOCOL docs need to be updated, but the protocol I settled on was
noted earlier in the bug. With the response being a header "Learned:
(Yes|No)".
3) I made no progress on the spamc part, it is open for anyone who would like
to tackle that code. Perhaps a working, or at least appearing to work spamc
will enspire me to finish up the spamc code (it is hard to test without a nice
working client).
Subject: Re: RFE: add learning support to spamd/spamc On Mon, Dec 06, 2004 at 02:06:15AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > So the protocol could be for learn a spam message "LEARN SPAM SPAMC/1.3" The protocol is: LEARN SPAMC/1.3 Learn-type: N Where N is 0 for learn as spam, 1 for learn as ham or 2 for forget. > Would a protocoll version change needed? No, since it is an added command, older spamds will just return a protocol error. Hmm, that makes me think. Maybe we want a way to enable/disable this functionality, so you don't have to accept the LEARN command from a client. This provides an avenue for the paranoid. > If you're interested i'll take the spamc-part!? Feel free, I'm really bad with C code. Michael Created attachment 2551 [details]
SpamC-part + SpamD-part(Michael)
My first patch for an open-source community. So be carefull! :-)
Integrates Michaels spamd-patch + my spamc-patch.
Currently response headers are not supported. I'll add this tomorow.
But it should be enough to continue spamd-dev. :-)
Best regards.
NicoP.
Created attachment 2555 [details]
Finished SpamC-part + SpamD-part (Michael)
Added more error handling and interprets the recieved header send back by
spamd.
Also one line is printed each time learning/unlearning was successfully or the
message has already been learned. Is that wanted?
Btw i'll now start to test Michael's spamd part.
Greetings.
NicoP.
Hello Michael, would it be possible to return in the respond header not only "Learned: No" if the learning doesn't carry out? Nice would be "Learned: Error", or something. Greetings. NicoP. To clarify, I mean spamd should only return "Learned: Error" if the learn process doesn't get passed successfully. This could happen e.g. due to insufficient rights to the underlying mysql database. Greetings. NicoP. Subject: Re: RFE: add learning support to spamd/spamc
>
> would it be possible to return in the respond header not only "Learned: No" if
> the learning doesn't carry out?
>
> Nice would be "Learned: Error", or something.
>
No, on error spamd should set the response code to the error and
return no header. If we need to come up with some new response codes
we can.
Michael
Okay, this would be fine. I think following failures should be captured and differently reported: - no database (mostly wrong dsn) - no access to db (pherhaps wrong privileges, (wrong user?)) I think there could be a lot more failures needed to be captured. I'll look for more. NicoP. Subject: Re: RFE: add learning support to spamd/spamc
> this would be fine. I think following failures should be captured and
> differently reported:
>
> - no database (mostly wrong dsn)
> - no access to db (pherhaps wrong privileges, (wrong user?))
>
> I think there could be a lot more failures needed to be captured. I'll look for
> more.
Just return a service unavailable error, see how the handle_user_sql
stuff is handled. I think you'll find that specific database errors
are not returned back to the caller. It would be up to the site admin
to run spamd in debug mode to discover that.
I think you are making the problem more complicated than it needs to
be.
Michael
Created attachment 2622 [details] New spamc part My new spamc patch which applies correctly on top of trunk (revision 124864)! (In reply to comment #31) > Subject: Re: RFE: add learning support to spamd/spamc > > > this would be fine. I think following failures should be captured and > > differently reported: > > > > - no database (mostly wrong dsn) > > - no access to db (pherhaps wrong privileges, (wrong user?)) > > > > I think there could be a lot more failures needed to be captured. I'll look for > > more. > > Just return a service unavailable error, see how the handle_user_sql > stuff is handled. I think you'll find that specific database errors > are not returned back to the caller. It would be up to the site admin > to run spamd in debug mode to discover that. > > I think you are making the problem more complicated than it needs to > be. > > Michael > I'll submitt a new spamc-part patch which applies correctly over the current trunk (r124864). I've handled no more errors, as Michael stated above. - spamc does return an EX_LEARNED (5) if learn/unlearn has been successfully by spamd - also it returns an EX_NOTLEARNED (6) if learn/unlearn has been unsuccessfully due to a pherhaps already learned/unlearned message - if learning takes place and an error takes place (e.g. protocol disagreement, due to old spamd) then an EX_UNAVAILABLE (69) will be triggered - if SPAMC_CHECK_ONLY (-c) or SPAMC_REPORT_IFSPAM or SPAMC_REPORT or SPAMC_SYMBOLS are combined with SPAMC_LEARN then a log message error will be printed (if specified -l) and spamc exits. Since i've discovered no more errors in the spamc-part, would it be possible to include it into the 3.1 trunk? To Michael: Did you get on with the spamd part? I've discovered no errors since I've integrated your spamd patch. I don't want to bother you, but pherhaps this could also be applied to the current trunk!? I think this bug addresses a long awaited feature as every week a question will be submitted to the spamassassin user forum about that. A CLA has already been submitted. Greetings NicoP. To clarify. Attachment id=2622 is the mentioned patch. Hello SpamAssassin developers,
i've just read the "Revisiting high-level 3.1 goals" for the 3.1 release and
i'am frightened to the following.
> And the anti-goals:
>
> - features: extra options, non-critical changes not related to the
> above goals, etc. (except perhaps in plugins)
> - option bloat (except perhaps in plugins)
Is this patch subject to the mentioned above? I hope it isn't.
Well, the code changes aren't much. And the changes doesn't affect the
SpamAssassin core. So it's an uncritical enhancement! And so there would be no
reasons why not to inlcude it into the 3.1 release.
Pherhaps there are other reasons. But then please explain it a bit more!
Greetings.
NicoP.
Subject: Re: RFE: add learning support to spamd/spamc
>
> Pherhaps there are other reasons. But then please explain it a bit more!
>
Relax, it is going in. It's just a matter of time at this point.
Nico, Can you please fill out and fax/mail in a CLA? http://www.apache.org/licenses/#clas Thanks Nico's listed on the committers list -- http://www.apache.org/%7ejim/committers.html -- as now having a CLA on-file. I've fixed his bz user record accordingly, and the patch can go in (or whatever you're planning, Michael). Subject: Re: RFE: add learning support to spamd/spamc On Wed, Feb 09, 2005 at 01:54:12PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > http://www.apache.org/%7ejim/committers.html -- as now having a CLA on-file. > I've fixed his bz user record accordingly, and the patch can go in (or whatever > you're planning, Michael). Excellent, just thinking ahead at this point, for when I get some time. Oops, i think there's something went wrong. No, i'am not John Newman! :-)) NicoP. uh-oh... theo, any chance you could look at that? Nico's name seems to have changed from "Nico Prenzel" to "John Newman" in the bz database... Subject: Re: RFE: add learning support to spamd/spamc On Thu, Feb 10, 2005 at 10:48:13AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > uh-oh... theo, any chance you could look at that? Nico's name seems to have > changed from "Nico Prenzel" to "John Newman" in the bz database... How odd! I changed the real name, so it should be set now. I wonder why the change happened... :| Subject: Re: RFE: add learning support to spamd/spamc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bugzilla-daemon@bugzilla.spamassassin.org writes: > http://bugzilla.spamassassin.org/show_bug.cgi?id01 > > ------- Additional Comments From felicity@kluge.net 2005-02-10 10:58 ------- > Subject: Re: RFE: add learning support to spamd/spamc > > On Thu, Feb 10, 2005 at 10:48:13AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > > uh-oh... theo, any chance you could look at that? Nico's name seems to have > > changed from "Nico Prenzel" to "John Newman" in the bz database... > > How odd! I changed the real name, so it should be set now. I wonder why the > change happened... :| (a) it's happened before (in pre-HASCLA days) -- remember when my name got changed to some guy from Australia? ;) (b) this time, it happened about when I went in to edit Nico's HASCLA flag. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCC6/CMJF5cimLx9ARAnUmAKCj/0B2DyZrimuyVrKJH6lwrjKSoQCgnL6V 1LBmz8tcpV3LaRKFDxCTr3Y= =x2C4 -----END PGP SIGNATURE----- Subject: Re: RFE: add learning support to spamd/spamc On Thu, Feb 10, 2005 at 11:02:48AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > (a) it's happened before (in pre-HASCLA days) -- remember when my name got > changed to some guy from Australia? ;) Yeah, iirc, that was for emails doing loose matching when BZ was trying to figure out who sent the mail. No data was modified, just the wrong data got selected. This time the database value changed. I think it would make sense to introduce the report message as spam (spamassassin --report) feature into spamc/spamd learing part. So spamc -d xxx.xxx.xxx.xxx -L 0 -z < SpamMessage.txt, would trigger report to razor and others. What's your oppinion about that? Subject: Re: RFE: add learning support to spamd/spamc
>
> What's your oppinion about that?
>
As a seperate command line argument, sure, open up a separate bug.
Learn should only learn, report will report and learn.
Michael
Committed revision 158029. The only thing I'm unsure about at this point is the exitcodes for spamc, when a msg is learned the exitcode is 5, when it has already been learned 6, all other errors should be the same as always. Is it acceptable to have a none 0 exit code for a success? 'The only thing I'm unsure about at this point is the exitcodes for spamc, when a msg is learned the exitcode is 5, when it has already been learned 6, all other errors should be the same as always. Is it acceptable to have a none 0 exit code for a success?' no, I don't think that's a good idea. if necessary, output a machine-readable line of text to stdout, which can be parsed to determine if the message was learned or already-seen. but non-zero exit codes really have to mean failure of some form or another, and "message was already learned" is not a failure condition. Subject: Re: RFE: add learning support to spamd/spamc
> ------- Additional Comments From jm@jmason.org 2005-03-18 10:54 -------
> no, I don't think that's a good idea. if necessary, output a machine-readable
> line of text to stdout, which can be parsed to determine if the message was
> learned or already-seen. but non-zero exit codes really have to mean failure of
> some form or another, and "message was already learned" is not a failure condition.
>
Ok, I changed it so it should exit with a 0 if it learns or was
already learned and pass through whatever error code was received
instead of dumping out with EX_UNAVAILABLE.
Michael
Subject: Re: RFE: add learning support to spamd/spamc > learned or already-seen. but non-zero exit codes really have to mean failure of > some form or another, and "message was already learned" is not a failure condition. Another possibility might be to use positive return values for errors and negative/zero returns for various kinds of goodness. > Another possibility might be to use positive return values for errors and
> negative/zero returns for various kinds of goodness.
According to the fine manuals for the various shells available on various
unix systems, zero is good and not zero is bad. There is no allowance
for different non zero values for the automatic error handling. You can
capture the return value in a variable and do your own thing, but this is
normally used to react properly to different kinds of errors.
Need to fix the libspamc.c code so it doesn't break binary compatability Remove 3597 from the dependency list, since it'd done and we're still working on other aspects of this one. Created attachment 2727 [details]
Patch to fix Binary Compatibility
Here is a first stab at this. I had to do a couple of things I was
uncomfortable with but it seems to work and pass tests.
I'd be grateful, if someone would look over the diff and a) make sure I'm doing
the right thing and b) it actually fixes the binary compatibility issue.
Anyone care to comment on this fix? Committed revision 159881. If anyone finds any errors we can address them then. |