SA Bugzilla – Bug 6319
bayes does not tokenize the from name
Last modified: 2010-02-02 11:21:45 UTC
Bayes doesn't tokenize the name part of the from header, e.g.: $ cat /tmp/dummy From: v1agra hyehdt <foo@example.com> Subject: meds gjguhdo test krhsye $ sa-learn --spam /tmp/dummy $ spamassassin -D bayes < /tmp/dummy 2>&1 1>/dev/null | grep -Ei "token.*=>" [5478] dbg: bayes: token 'meds' => 0.999854151320635 [5478] dbg: bayes: token 'H*F:U*foo' => 0.993172413793104 [5478] dbg: bayes: token 'H*F:D*example.com' => 0.993172413793104 [5478] dbg: bayes: token 'H*Ad:D*example.com' => 0.993172413793104 [5478] dbg: bayes: token 'test' => 0.011685356810132 [5478] dbg: bayes: token 'krhsye' => 0.986543689320388 [5478] dbg: bayes: token 'gjguhdo' => 0.986543689320388
Interesting. This is of note to bug 6315 and I have made this a blocker for that bug. Also note that the subject is read in as if a part of the body. I was under a different impression: I thought we did either two tokens (one as if in the body and one as a subject-specific token) or as just a subject-specific token. This would be a separate bug.