Bug 5431

Summary: Method to test whether a piece of mail has already been learned.
Product: Spamassassin Reporter: Faisal N Jawdat <faisal>
Component: spamassassinAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: enhancement    
Priority: P5    
Version: unspecified   
Target Milestone: Undefined   
Hardware: All   
OS: All   
Whiteboard:

Description Faisal N Jawdat 2007-04-20 22:03:19 UTC
I would like a method to test whether a piece of mail has already been learned, without just re-learning 
the mail).  Ideally this method would bd:

- be available as an option on sa-learn as well as a function for programatic use (see below)
- be able to indicate if the already-learned message was learned as spam or ham

this is from a post I made to the users list, and describes a potential use case:

>>>
> Try to learn it, if it comes back with something to the affect of:
> "learned from 0 messages, processed 1.." then it's already been  
> learned.

this seems to be the common suggestion.

it has a couple drawbacks, as i see it:

1.  it's relatively cpu-intensive if i want to do it all the time  
(e.g. scan my spam folder to learn only the messages which haven't  
already been learned)

2.  which way do i learn it.

to step back a bit, my final goal is to be able to figure out which  
messages in a folder haven't been learned, and learn only those.  in  
the ideal situation i can also figure out (ahead of time), whether a  
learned message was learned as ham or spam.

this may be semi-impossible.

on the other hand, what can i learn from the headers?

e.g. it looks like autolearn=[something] will tell me about the  
autolearner, but is there anything for manual learns?

where i'm going with all this:

i can run a cron job to learn the contents of different mailboxes on  
a regular basis.  what i do now is have a TrainSpam and TrainHam  
mailbox, and when something gets misfiled (in Spam or any ham folder)  
i just move it in there.  every 5 minutes a cron job goes through and  
scans things appropriately. <http://www.faisal.com/software/sa- 
harvest/quicktrain.html>

first, i'd like to be able to do that within the mailboxes rather  
than using special mailboxes.

second, i'd like to be able to key off junk mail flags set by the  
client (thunderbird, apple mail).  i'm using dovecot, so it's a  
fairly simple matter of parsing Maildir filenames, but to do it right  
i need to combine the knowledge with what spamassassin thinks.
<<<