5431 – Method to test whether a piece of mail has already been learned.

Bug 5431 - Method to test whether a piece of mail has already been learned.

Summary: Method to test whether a piece of mail has already been learned.

Status:	NEW

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	spamassassin (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P5 enhancement
Target Milestone:	Undefined
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-04-20 22:03 UTC by Faisal N Jawdat
Modified:	2007-04-20 22:03 UTC (History)
CC List:	0 users

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Faisal N Jawdat 2007-04-20 22:03:19 UTC

I would like a method to test whether a piece of mail has already been learned, without just re-learning 
the mail).  Ideally this method would bd:

- be available as an option on sa-learn as well as a function for programatic use (see below)
- be able to indicate if the already-learned message was learned as spam or ham

this is from a post I made to the users list, and describes a potential use case:

>>>
> Try to learn it, if it comes back with something to the affect of:
> "learned from 0 messages, processed 1.." then it's already been  
> learned.

this seems to be the common suggestion.

it has a couple drawbacks, as i see it:

1.  it's relatively cpu-intensive if i want to do it all the time  
(e.g. scan my spam folder to learn only the messages which haven't  
already been learned)

2.  which way do i learn it.

to step back a bit, my final goal is to be able to figure out which  
messages in a folder haven't been learned, and learn only those.  in  
the ideal situation i can also figure out (ahead of time), whether a  
learned message was learned as ham or spam.

this may be semi-impossible.

on the other hand, what can i learn from the headers?

e.g. it looks like autolearn=[something] will tell me about the  
autolearner, but is there anything for manual learns?

where i'm going with all this:

i can run a cron job to learn the contents of different mailboxes on  
a regular basis.  what i do now is have a TrainSpam and TrainHam  
mailbox, and when something gets misfiled (in Spam or any ham folder)  
i just move it in there.  every 5 minutes a cron job goes through and  
scans things appropriately. <http://www.faisal.com/software/sa- 
harvest/quicktrain.html>

first, i'd like to be able to do that within the mailboxes rather  
than using special mailboxes.

second, i'd like to be able to key off junk mail flags set by the  
client (thunderbird, apple mail).  i'm using dovecot, so it's a  
fairly simple matter of parsing Maildir filenames, but to do it right  
i need to combine the knowledge with what spamassassin thinks.
<<<