SA Bugzilla – Bug 304
-S option is utterly and hopelessly broken
Last modified: 2004-05-18 13:45:38 UTC
hi, i was having great trouble getting whitelist_from to work correctly. then i discovered what the problem was: i was using the -S option to improve performance. doing so, however, also seems to completely disable the whitelist feature. this is because, apparently, the whitelist weighting is only added at the *end* of the scoring process (see example below). is this really the desired behavior? in light of this, would it not perhaps be better to have spamassassin always add the whitelist score *first* in the scoring process? if so, it probably would also be wise to have -S stop on a configurable minimum threshold as well (e.g., -100); otherwise, -S will be of no benefit to whitelisted messages (i.e., such messages will be let through but will always undergo full content analysis, which isn't needed since they are whitelisted). cheers! doug example: # cat spam_msg.txt | spamassassin -t -L -D - S SPAM: -------------------- Start SpamAssassin results ---------------------- SPAM: This mail is probably spam. The original message has been altered SPAM: so you can recognise or block similar unwanted mail in future. SPAM: See http://spamassassin.org/tag/ for more details. SPAM: SPAM: Content analysis details: (5.6 hits, 5 required) SPAM: Hit! (4.3 points) Reply-To: is empty SPAM: Hit! (1.3 points) Received via SMTPD32 server (SMTPD32-n.n) SPAM: SPAM: -------------------- End of SpamAssassin results --------------------- the message is flagged as spam. now, drop the -S option: SPAM: -------------------- Start SpamAssassin results ---------------------- SPAM: This mail is probably spam. The original message has been altered SPAM: so you can recognise or block similar unwanted mail in future. SPAM: See http://spamassassin.org/tag/ for more details. SPAM: SPAM: Content analysis details: (-84.1 hits, 5 required) SPAM: Hit! (4.3 points) Reply-To: is empty SPAM: Hit! (1.3 points) Received via SMTPD32 server (SMTPD32-n.n) SPAM: Hit! (1.0 point) From: ends in numbers SPAM: Hit! (0.5 points) Subject has an exclamation mark SPAM: Hit! (1.5 points) BODY: Asks you to click below SPAM: Hit! (0.2 points) BODY: No such thing as a free lunch (1) SPAM: Hit! (2.3 points) BODY: List removal information SPAM: Hit! (1.6 points) BODY: Mentions Spam Law "UCE-Mail Act" SPAM: Hit! (1.0 point) BODY: No such thing as a free lunch (3) SPAM: Hit! (0.9 points) BODY: Mentions Spam law "H.R. 3113" SPAM: Hit! (1.3 points) URI: Includes a link to a likely spammer email address SPAM: Hit! (-100.0 points) From: address is in the user's white-list SPAM: SPAM: -------------------- End of SpamAssassin results --------------------- the message is correctly whitelisted and allowed through. just for the record, i am running spamassassin v2.20 on redhat 7.2. [EOF]
*** Bug 341 has been marked as a duplicate of this bug. ***
Unless I can't read code, (which is a possibility) the -S flag is virtually useless. Instead of running negatively scoring tests, followed by positively scoring tests (like I thought it did), the following is run (more or less): rbl tests -ve head tests +ve head tests -ve body tests +ve body tests -ve uri tests +ve uri tests -ve rawbody tests +ve rawbody tests -ve rawbody evals +ve rawbody evals -ve full tests +ve full tests -ve full evals +ve full evals -ve head evals +ve head evals more rbl tests? awl test This is pretty important! Essentially, -S doesn't work the way it's meant to at all.
should we leave the -S option broken, and just ignore it in 2.40?
Subject: Re: [SAdev] -S option is utterly and hopelessly broken On Wed, Aug 07, 2002 at 07:21:57AM -0700, bugzilla-daemon@hughes-family.org wrote: > should we leave the -S option broken, and just ignore it in > 2.40? I guess that depends when that freeze/2.40 release is planned for... To sum up the discussion so far, I believe there are three problems that need solving: 1) -S cuts the time for spam, but non-spam will have all the rules run against them. 2) {white,black}list_* entries just add a score whereas they should probably just short-circuit completely. 3) Instead of running all neg. then pos. tests, we run neg/pos head, then neg/pos body, etc. The third one shouldn't be hard to fix, add a few if statements around the do_* functions, have a "foreach negative, positive" or "foreach both" in check, and pass a variable saying what we should be testing to each do_ function. (do neg, or pos, or both (if -S isn't used)) The second one shouldn't be that bad either, run those tests first and check test_names_hit when it returns. If there were any hits, abort there like -S. I'm thinking of just checking USER_IN_BLACKLIST, USER_IN_WHITELIST, USER_IN_BLACKLIST, and USER_IN_ALL_SPAM_TO. The rest should probably just stay adding a score. I don't know how to solve #1, beyond solving #2. Part of the problem of checking spaminess is, well, checking spaminess. We need to check everything before we can properly report spaminess, so ... Also: If running -S, should AWL function since the scores will be skewed? I'd say we should avoid AWL if #2 occurs as well (why add the whitelist to the AWL if it's explicitly *listed already?)
Subject: Re: [SAdev] -S option is utterly and hopelessly broken BDFO> should we leave the -S option broken, and just ignore it in BDFO> 2.40? Just ignore it and put a notice about this in the manpage. This wouldn't be a real problem - SA won't be as fast as the user expects, but that's all. In contrast, it would be a problem if we leave it as it is, because SA could even mis-identify messages as spam then. Wasn't -S broken ever since it was added? tobias
Subject: Re: [SAdev] -S option is utterly and hopelessly broken I'm not sure it's as badly broken as people make out -- I think it was badly broken in 2.30, possibly 2.31, but I think some reasonable amount of fix up has been done on it in the 2.40CVS -- I'll schedule a bit of time to take a look at it, but I think -S is a high priority. I think Deersoft would even be willing to allocate resources to it if required.
Subject: Re: [SAdev] -S option is utterly and hopelessly broken It's really broken and the idea is wrong anyway (IMHO). I say this even though I implemented it. I'm willing to bet we'd get more of a speedup if we took that -S stuff *out* of SA. Lets focus on getting decision trees working - that'll speed things up a hell of a lot faster than -S ever would. Matt.
Subject: Re: [SAdev] Re: -S option is utterly and hopelessly broken I had been thinking along these lines too. Scott, I'd love to see any work you end up doing on this. Two tricks are converting from the config file syntax to a lexer, and then realizing that multiple patterns might match the same chunk of text, so the lexer has to be a little smarter. Still shouldn't be too complicated though to get a huge speedup. C On Saturday, August 10, 2002, at 10:20 PM, Scott A Crosby wrote: > On Thu, 8 Aug 2002 04:47:18 -0700 (PDT), bugzilla- > daemon@hughes-family.org writes: > >> Lets focus on getting decision trees working - that'll speed >> things up a >> hell of a lot faster than -S ever would. > > Hell no. Flex-style regexp matching will do regexp's 100x > faster. Then, we can run *all* the regexp's at the same time at > megabytes/second.. I'll have a prototype example sometime in the next > couple of weeks to demo the possibilities. (My script requires some > more adaption.. I can't promise that it'll work, but I'm pretty sure I > will fulfill my claims.) > > I think that that has the greatest win... And to boot, as we always > have the full regexp results, we can feed them into more interesting > things, like ANN's, or perceptron networks, or anything else.
status on this? I guess we'll be documenting -S as broken for 2.40, afaics...
Subject: Re: [SAdev] -S option is utterly and hopelessly broken Awaiting ability to run rules in arbitrary orders (i.e. probably 2.5 or 3.0).
lowering pri
Closing as WONTFIX. -S has been gone for a long time. We'll continue to work on performance and early stop of heuristics may be a part of the solution, but it hopefully won't require an option or it may be implemented completely differently, but this one is definitely a WONTFIX.
*** Bug 3109 has been marked as a duplicate of this bug. ***