Bug 6319 - bayes does not tokenize the from name
Summary: bayes does not tokenize the from name
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.2.5
Hardware: All All
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 6315
  Show dependency tree
 
Reported: 2010-02-02 05:45 UTC by RW
Modified: 2010-02-02 11:21 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description RW 2010-02-02 05:45:12 UTC
Bayes doesn't tokenize the name part of the from header, e.g.:

$ cat /tmp/dummy 
From: v1agra hyehdt <foo@example.com>
Subject: meds  gjguhdo

test krhsye

$ sa-learn --spam   /tmp/dummy
$ spamassassin -D bayes < /tmp/dummy 2>&1 1>/dev/null | grep -Ei "token.*=>"
[5478] dbg: bayes: token 'meds' => 0.999854151320635
[5478] dbg: bayes: token 'H*F:U*foo' => 0.993172413793104
[5478] dbg: bayes: token 'H*F:D*example.com' => 0.993172413793104
[5478] dbg: bayes: token 'H*Ad:D*example.com' => 0.993172413793104
[5478] dbg: bayes: token 'test' => 0.011685356810132
[5478] dbg: bayes: token 'krhsye' => 0.986543689320388
[5478] dbg: bayes: token 'gjguhdo' => 0.986543689320388
Comment 1 Adam Katz 2010-02-02 11:21:45 UTC
Interesting.

This is of note to bug 6315 and I have made this a blocker for that bug.

Also note that the subject is read in as if a part of the body.  I was under a different impression:  I thought we did either two tokens (one as if in the body and one as a subject-specific token) or as just a subject-specific token.  This would be a separate bug.