Bug 6123 - Add "tflags exponential" to allow increasing score for multiple hits
Summary: Add "tflags exponential" to allow increasing score for multiple hits
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Score Generation (show other bugs)
Version: 3.2.5
Hardware: Other All
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-03 10:20 UTC by John Hardin
Modified: 2019-07-30 09:22 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description John Hardin 2009-06-03 10:20:25 UTC
"tflags multiple" is very useful, but it would be nice to be able to add a greater penalty the more times a given match is repeated.

I suggest "tflags exponential" as a variant of "tflags multiple". Setting this flag would cause the overall score to be changed by (rule_score * rule_hits_so_far), as opposed to (rule_score * 1) as for the basic "multiple" scoring.

(I know it's not truly exponential scoring...)
Comment 1 Matt Kettler 2009-06-03 19:51:28 UTC
In that case I would suggest calling it tflag multiply, not tflag exponential.


Personally, I don't think it's a outright bad idea, but I don't see a whole lot of value in it. Perhaps I just need some examples of how it's really useful.

In my thinking, it might be useful for implementing rules with small scores that don't hurt nonspam, but add up enough to mean something with repeated hits. However I just can't think of an example where the spread of hit count would be sufficiently large. ie: I can think of cases where spam might have 3 matches, and nonspam only 1, but that's not really a big enough spread. IMO, you'd need something with close to a factor of 10 difference between the typical nonspam hits and typical spam hits.


Do you have some example ideas for rules, and spam messages that this would affect?

I ask largely because it's my impression that while implementing this would not horribly difficult, it also would not be trivial. However implementing support for it in the perceptron so it can be used in the base ruleset would be down right complicated. 

It should also be noted that has been suggested numerous times in the past, and nobody's presented a decent case to convince someone to implement it yet. That's not to say it's a bad idea, but it's one that clearly needs a reason, not just a simple "this would be useful".







Comment 2 Karsten Bräckelmann 2009-06-04 02:08:40 UTC
While this probably can be useful in some cases, it makes it really easy for the user to shoot his own foot. The problem is with carefully evaluating how many hits can be considered ok-ish, and where to raise the score beyond all thresholds -- and to craft scores matching that.

If this would be implemented, I guess I'd prefer something like the procmail weighted scoring technique, which would cover this -- as well as the currently existing ones as special cases x=0 (plain rule) and x=1 (tflags multiple). More of a gut feeling, though, didn't think it through properly yet. ;)

As for the name, tflags multiply is a no-go IMHO. This needs some better distinction from multiple than a single char change.

Matt, can you point us at a previous discussion? On list, or bugzilla?


Anyway, regarding the challenge to come up with an example, the following flexible and easy to grok rules' stub is about what you need to beat. Doesn't it pretty much do what you intend?

tflags __FOO multiple

meta  FOO    ( __FOO )
score FOO    0.2

meta  FOO_4  ( __FOO >= 4 )
score FOO_4  1.0

meta  FOO_8  ( __FOO >= 8 )
score FOO_8  2.5
Comment 3 Justin Mason 2009-06-04 03:01:03 UTC
I think we've implemented similar features in the past using eval rules, btw.  It'd be easy to do it with an eval-rule plugin, too.  But Karsten's 'tflags multiple' rules are even easier to understand...
Comment 4 John Hardin 2009-06-04 06:31:43 UTC
The example I have in mind is "fill in the form" stuff in frauds and phishes. The more name/address/phone/gender/whatever blanks the spam has for the victim to fill in, the more points I'd like to assign. 6-8 blanks should score much higher than 1 or 2.

Karsten's metas are certainly a workaround, but to my mind it'd be a lot easier to say:

  SCORE 0.15
  TFLAGS exponential

than figure out a set of metarules. This would also keep the total number of rules down.

I hadn't intended this to be used very often, or to be incorporated into the perceptron. I also thought implementation would be pretty easy; I'll poke around and see if that is indeed the case.
Comment 5 Karsten Bräckelmann 2009-06-04 09:44:13 UTC
Granted, this could result in shorter and more concise rules.

A problem I see, however, is the potentially indefinite growth. Next thing that will be requested is a per rule tflags exponential_max_score with a value, to prevent the rule from single-handedly pushing the score above some high cut-off threshold. Like it has been often requested for FuzzyOCR.

Yes, this also applies to tflags multiple, to a lesser extent. ;)  Which seems to exclusively be used in a counting fashion like my example above, rather than in a real multiple scoring fashion. At least I don't recall ever seeing it, since the backhair rule-set. (Which, frankly, was almost as unsightly to watch in the status and report header as what it targeted. ;)
Comment 6 Michael Parker 2009-06-04 09:58:17 UTC
This can easily be done in a plugin, in fact I had the code half written when Firefox decided it didn't want to play well with others and now I'm too busy to re-write.

I think it would be better to just promote that method for folks and put it up on the wiki.  All the tools are there, its just a matter of someone putting it together.

FYI, my solution did not involve tflags.

Comment 7 Henrik Krohns 2019-07-30 09:22:21 UTC
Closing old stale bug. I see no reason to bloat code for rare cases which can simply be handled with metas.