Bug 6319

Summary: bayes does not tokenize the from name
Product: Spamassassin Reporter: RW <rwmaillists>
Component: PluginsAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: normal CC: antispam
Priority: P5    
Version: 3.2.5   
Target Milestone: Undefined   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on:    
Bug Blocks: 6315    

Description RW 2010-02-02 05:45:12 UTC
Bayes doesn't tokenize the name part of the from header, e.g.:

$ cat /tmp/dummy 
From: v1agra hyehdt <foo@example.com>
Subject: meds  gjguhdo

test krhsye

$ sa-learn --spam   /tmp/dummy
$ spamassassin -D bayes < /tmp/dummy 2>&1 1>/dev/null | grep -Ei "token.*=>"
[5478] dbg: bayes: token 'meds' => 0.999854151320635
[5478] dbg: bayes: token 'H*F:U*foo' => 0.993172413793104
[5478] dbg: bayes: token 'H*F:D*example.com' => 0.993172413793104
[5478] dbg: bayes: token 'H*Ad:D*example.com' => 0.993172413793104
[5478] dbg: bayes: token 'test' => 0.011685356810132
[5478] dbg: bayes: token 'krhsye' => 0.986543689320388
[5478] dbg: bayes: token 'gjguhdo' => 0.986543689320388
Comment 1 Adam Katz 2010-02-02 11:21:45 UTC
Interesting.

This is of note to bug 6315 and I have made this a blocker for that bug.

Also note that the subject is read in as if a part of the body.  I was under a different impression:  I thought we did either two tokens (one as if in the body and one as a subject-specific token) or as just a subject-specific token.  This would be a separate bug.