Bug 3915 - spamassassin sometimes skips to do bayes test
Summary: spamassassin sometimes skips to do bayes test
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.0.0
Hardware: PC Linux
: P2 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords: triage
Depends on:
Blocks:
 
Reported: 2004-10-21 03:56 UTC by Łukasz Mach
Modified: 2005-05-24 15:14 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
first arrive, bayes tested message/rfc822 None Łukasz Mach [NoCLA]
second arrived, bayes not tested message/rfc822 None Łukasz Mach [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Łukasz Mach 2004-10-21 03:56:14 UTC
I'm using spamassassin in .procmailrc:

:0fw: .spamassassin.lock
| /usr/bin/spamassassin

:0:
* ^X-Spam-Status: Yes
mail/spamfolder


I'm using sa-learn, so bayes test are quite important. It generally works. 
But sometimes junk break through to inbox, and most of that false-negatives has
no BAYES_XX test. So sometimes spamassassin doesn't check email for bayes
probability. 

for 200 spams/day it gives 10 junk in inbox, and half of them meets problem
described above.
Comment 1 Łukasz Mach 2004-10-21 15:40:35 UTC
additional information:
  it seems only happend if given message is spam, and isn't classified by
spamassassin as spam (has less then 5 points). 

if it's normal message, ie. from my friend, it has ie. BAYES_00 tag. but if it's
spam, it sometimes has no BAYES_XX at all. some of messages classified as spam
hasn't BAYES either. 
Comment 2 Theo Van Dinter 2004-10-21 15:50:58 UTC
not having a bayes hit is not necessarily a bug.  you need to run the message(s) through spamassassin 
in debug mode (-D) and see what's coming up.  it could just be there aren't enough tokens in the 
message that are also in your DB, so bayes disables since there's no way to give a probability.
Comment 3 Łukasz Mach 2004-10-21 18:05:55 UTC
Problem is, that I cannot reproduce bug by hand. The same message, which arrives
without BAYES test, processed again behaves properly. So cause is unknown for me
now, seems to be random. 

Due to above, I don't think it's related to not enough tokens. 
Comment 4 Daryl C. W. O'Shea 2004-10-22 05:37:26 UTC
The message isn't being auto-learned the first time through, is it?  This would
explain bayes assigning a probability the second time through, since it would
then have enough tokens.
Comment 5 Łukasz Mach 2004-10-22 06:57:16 UTC
hmm, maybe. I have cleaned my spamfolder, so I can't say if you're right or wrong. 

I'll wait for such accident and check if emails without BAYES has no equivalents
which came earlier and was classified properly. 
Comment 6 Łukasz Mach 2004-10-25 04:47:23 UTC
Created attachment 2482 [details]
first arrive, bayes tested
Comment 7 Łukasz Mach 2004-10-25 04:47:58 UTC
Created attachment 2483 [details]
second arrived, bayes not tested
Comment 8 Łukasz Mach 2004-10-25 04:50:14 UTC
look at the attachments I made. They are almost identical, but first was tested
by bayes, second wasn't tested. 

So I think it isn't true that bayes hasn't enough tokens. 
Comment 9 Daryl C. W. O'Shea 2004-10-25 05:06:28 UTC
The first message wasn't learned since it scored between the learning thresholds.

The tokens matched in your database must have been from the headers which are
vastly different from the second message.

The examples don't show anything wrong.
Comment 10 Theo Van Dinter 2004-10-25 09:03:35 UTC
Subject: Re:  spamassassin sometimes skips to do bayes test

On Mon, Oct 25, 2004 at 05:06:29AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> The examples don't show anything wrong.

As usual, if there was debug output, we'd know what was going on.

Comment 11 Łukasz Mach 2004-10-25 11:53:37 UTC
ok, I realized how to turn on debugging and use spamassassin in the same time.
I'll show if there will be something interesting. 

ad comment #9

> The tokens matched in your database must have been from the headers which are
> vastly different from the second message.
> The examples don't show anything wrong.

I don't know why different headers are important here. I thought that bayes test
examine whole body of message, not only headers (at least #1 at
http://wiki.apache.org/spamassassin/BayesInSpamAssassin seems to say something
like that). Body of both messages is in mostly identical. First message was
classified by bayes (so there was tokens in db), second not (should use tokens
used in first message). 
Comment 12 Bob Menschel 2005-04-09 17:41:35 UTC
> I don't know why different headers are important here. ...

Yes, Bayes tests the whole body, headers and body text. And it considers all of
that. If in your first message the headers and body together indicated spam, but
in the second message with similar body, but very different headers, the body
suggested spam and headers suggested not-spam, then Bayes wouldn't make a
determination. 

Anyway, in October you were going to see if you could get a full debug output
which would let us know whether Bayes was working and simply not able to make a
determination on the "skipped" messages, or whether there was an actual problem.
Were you able to do this, and make a determination? Is there a problem that
still needs to be pursued? 
Comment 13 Bob Menschel 2005-05-24 23:14:10 UTC
Triage: Closing as WORKSFORME only because there's been no debug output to work
with, as offered in October and requested again last month. If someone can
reproduce this and provide debug output, please reopen.