Bug 6890 - Allow 'override threshold score' in spamc
Summary: Allow 'override threshold score' in spamc
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 enhancement
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-13 22:36 UTC by Jez
Modified: 2013-06-21 15:48 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Proposed patch 1 application/octet-stream None Jez [NoCLA]
Second patch; uses offset value instead of absolute patch None Jez [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Jez 2013-01-13 22:36:42 UTC
Created attachment 5130 [details]
Proposed patch 1

I attach a proposed patch to this bug, for the spamc client which I have tested - it seems to be working well.  It's based off the latest SVN I got from:
http://svn.apache.org/repos/asf/spamassassin/trunk

It adds the option -T to allow the spam threshold value to be overridden, which changes when spamc will return 0/1 to indicate spam, but does *not* change any of SpamAssassin's message rewriting.

The reason I think this change is a good idea is that I (and perhaps others) find myself wanting to be able to define a "grey area" of e-mails, where they are rewritten as spam but not completely discarded.  This can be done by setting the threshold override higher than the normal SpamAssassin threshold.

For example, if you want the grey area to be between 3.0 and 8.0, you'd set the normal SpamAssassin threshold to 3.0, then call spamc with the threshold override of 8.0.  Scores under 3.0 would be OK; spamc returns 0 and the message is not rewritten.  Scores between 3.0 and 8.0 would be "grey area" - spamc still returns 0 (so a shell script will not discard it), but the message *is* rewritten.  It will be delivered to the user's mailbox as a rewritten message.  Remaining scores would cause the message to be rewritten *and* spamc would return 1, allowing a shell script to discard it.
Comment 1 RW 2013-01-14 15:30:06 UTC
As things stand the spam threshold is configurable per user. Shouldn't the threshold used be either the higher of the two, or perhaps an offset from the normal configured spam threshold.
Comment 2 Jez 2013-01-14 17:42:30 UTC
(In reply to comment #1)
> As things stand the spam threshold is configurable per user. Shouldn't the
> threshold used be either the higher of the two, or perhaps an offset from
> the normal configured spam threshold.

I like the idea of making it an offset rather than an absolute value.  So, to clarify, what you're saying is that if your normal spam threshold is 3 and you want the override to be 8, instead of passing -T 8.0, you'd pass -T 5.0?  The one thing is that you wouldn't be able to pass a negative number on the command line, but actually I don't think that makes semantic sense anyway (you want to discard spam that SpamAssassin has *not* marked as spam??) so that may not be a problem.
Comment 3 Jez 2013-01-14 18:21:48 UTC
Created attachment 5131 [details]
Second patch; uses offset value instead of absolute

This version of the patch makes -T into an offset value instead of an absolute threshold override, as discussed previously.
Comment 4 AXB 2013-01-14 18:22:54 UTC
(In reply to comment #2)

imo, absolute value would make it easier for the average user and safer.

Offset can lead to confusion.

if default is 5.0
to stick how several known glue systems' naming change could be:

low/tag spam to 3.0
high/kill spam 15.0

I don't really like the idea of adding complexity to spamc but my voice is one among many.
Comment 5 Jez 2013-01-17 19:16:02 UTC
(In reply to comment #4)
> (In reply to comment #2)
> 
> imo, absolute value would make it easier for the average user and safer.

Well, both patches seem to be bug-free, it's just that the first one uses an absolute value for the threshold and the second uses an offset.  You can choose which one to apply based on what you want -T to do.
Comment 6 Jez 2013-05-09 18:05:46 UTC
As I was advised on the mailing list, I'm just noting here that I have now been running this patch on my server for several months with no problems.  :-)
Comment 7 Karsten Bräckelmann 2013-05-09 21:55:17 UTC
Frankly, I don't like the idea of adding such decision / override option to spamc.

First of all, spamc is meant to be a light-weight client, merely feeding messages to the daemon, and dealing with the result in an absolute minimal way. spamc deliberately does not decide about the score, nor rewrite the message or even add headers. That is entirely left to spamd.

Thus, if a feature like this would be implemented, I am strongly in favor of dealing with it in spamd.

Moreover, this basically duplicates existing functionality, and introduces the notion of "discard" -- something SA does not do. SA scores, but it does not deliver, nor discard.

The environment described in comment 0 is almost identical to adding the X-Spam-Level header, which is not by coincidence meant to easily allow for this shades of gray differentiation. The only difference between this existing behavior and the request is, what the spamc calling tool acts upon -- the $is_spam return value, instead of grepping for a single line.

Changing, overriding the decision (exitcode) whether a message is deemed spam or not in this way seems pretty confusing, too. A message's score exceeding the required_score threshold is classified spam. The return value of spamc should not claim anything else.


Setting Severity back to enhancement, as Jez, the reporter, even set it himself. I don't see how this would be a blocker.
Comment 8 Jez 2013-05-09 22:48:46 UTC
(In reply to comment #7)
> First of all, spamc is meant to be a light-weight client, merely feeding
> messages to the daemon, and dealing with the result in an absolute minimal
> way. spamc deliberately does not decide about the score, nor rewrite the
> message or even add headers. That is entirely left to spamd.

I fail to see the problem.  spamc can optionally care about the score - it's a very convenient place to do it because it already has the score stored in a struct in memory.  Serializing it to text and then deserializing again is horribly inefficient.

> Thus, if a feature like this would be implemented, I am strongly in favor of
> dealing with it in spamd.

By definition this can't be dealt with in spamd, because it is supposed to offer a lightweight way of defining a "grey area" for the spam score.

> Moreover, this basically duplicates existing functionality

No it doesn't, or I wouldn't have had to add it.  Existing functionality is slow, slow, slow.  This is so much quicker that I'd say it counts as new functionality.

> , and introduces
> the notion of "discard" -- something SA does not do. SA scores, but it does
> not deliver, nor discard.

It doesn't really introduce the notion of discard, I just suggested that one interpretation a shell script might have for a return value of 1 is to discard the message - which is what often happens already - but of course the script can do what it wants.

> The environment described in comment 0 is almost identical to adding the
> X-Spam-Level header, which is not by coincidence meant to easily allow for
> this shades of gray differentiation. The only difference between this
> existing behavior and the request is, what the spamc calling tool acts upon
> -- the $is_spam return value, instead of grepping for a single line.

Quite, and when that tool is a shell script, it has very little to work with.  grepping for a score that has been serialized, deserializing it again, and re-converting it to an integer is horrible to do in bash and, apart from anything else, inefficient as hell.  There's a good reason I didn't want to go down that route.

> Changing, overriding the decision (exitcode) whether a message is deemed
> spam or not in this way seems pretty confusing, too. A message's score
> exceeding the required_score threshold is classified spam. The return value
> of spamc should not claim anything else.

If if changed the existing behaviour, I'd agree it could be confusing, but it doesn't.  By default, nothing changes with this patch.  You have to add the extra commandline argument.  Who is going to do that without reading the documentation?  If they randomly add this commandline arg and get confused by it, then I'd say it's their fault.
Comment 9 Karsten Bräckelmann 2013-05-09 23:47:14 UTC
(In reply to comment #8)
> Quite, and when that tool is a shell script, it has very little to work
> with.  grepping for a score that has been serialized, deserializing it
> again, and re-converting it to an integer is horrible to do in bash and,
> apart from anything else, inefficient as hell.  There's a good reason I
> didn't want to go down that route.

The RE is straight forward, and documented in the procmail recipe example. There is nothing to serialize and deserialize, since the RE works on the already existing X-Spam-Level header.

  grep -E '^X-Spam-Level: \*{8,}


BTW, which variant of "rewriting" are you using? Rewriting headers, or attaching the spam to a wrapper message?
Comment 10 Jez 2013-05-10 15:54:49 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Quite, and when that tool is a shell script, it has very little to work
> > with.  grepping for a score that has been serialized, deserializing it
> > again, and re-converting it to an integer is horrible to do in bash and,
> > apart from anything else, inefficient as hell.  There's a good reason I
> > didn't want to go down that route.
> 
> The RE is straight forward, and documented in the procmail recipe example.
> There is nothing to serialize and deserialize, since the RE works on the
> already existing X-Spam-Level header.
> 
>   grep -E '^X-Spam-Level: \*{8,}

So grepping is as efficient as checking a process's return value?

> BTW, which variant of "rewriting" are you using? Rewriting headers, or
> attaching the spam to a wrapper message?

Rewriting headers.
Comment 11 Karsten Bräckelmann 2013-05-10 20:16:01 UTC
(In reply to comment #10)
> >   grep -E '^X-Spam-Level: \*{8,}
> 
> So grepping is as efficient as checking a process's return value?

Of course not. But if *that* difference makes such a huge deal to you, you must be shocked when you figure out SA itself is written in Perl, not C...

> > BTW, which variant of "rewriting" are you using? Rewriting headers, or
> > attaching the spam to a wrapper message?
> 
> Rewriting headers.

Good, so you have the headers I referred to. :)  However, this also makes me wonder why exactly you are using this approach -- other than optimization.

With rewriting headers (report_safe 0), there is about one difference in rewriting: The X-Spam-Report header with rules' scores and descriptions. Your comment 0 suggests rewriting would not happen for ham -- which is false for header rewriting. X-Spam headers are added always, with some (by default) being restricted to spam.

You can change that and add them unconditionally. So your 3..8 gray area will be "rewritten" just the same as spam, which is what you want. (Granted, ham gets this header, too.) By setting required_score 8, spamc exits with the desired code...


Jez, please don't get me wrong -- it is great to see you contributing the patch that fixes a particular issue for you. This is open source at its best.

I am always open to being convinced. However, at this point, this is just an extreme over-optimization edge case, and borderline consistent with SA terms. My personal opinion is against inclusion in upstream.
Comment 12 Kevin A. McGrail 2013-06-21 15:48:47 UTC
Jez, I look forward to other patches but this one can't be accepted upstream.