Bug 6473 - Making Bayes Learn RelayCountry Metadata
Summary: Making Bayes Learn RelayCountry Metadata
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: unspecified
Hardware: PC FreeBSD
: P2 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 6433 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-29 19:53 UTC by RW
Modified: 2018-02-21 14:36 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Patch to add Bayes-specific Relaycountry metadata application/octet-stream None RW [NoCLA]
Updated patch for Bayes-specific Relaycountry metadata patch None RW [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description RW 2010-07-29 19:53:58 UTC
Created attachment 4794 [details]
Patch to add Bayes-specific Relaycountry metadata

Bayes doesn't learn tokens shorter than 3 characters and so discards all the two-letter country codes in the RelayCountry metadata.

As the existing format is well suited to header rules, and to avoid breaking existing local rules, I suggest adding additional metadata specifically for Bayes.

I've attached a patch.  It produces a token for the first trusted country, plus a token for each country change e.g.  

 "US US CA NG"  becomes "Trusted_US USCA CANG"

I think this is better than simply having a token per country as that loses all information about ordering e.g. if you are running SA in the UK then "TW" and "CZ TW" might be all spam, but "GB TW" and "US TW" could be less spammy due to travellers using  TW IP addresses to connect their submission servers. 

Ordered pairs are also more resistant to forged headers. If a spammer adds extra received headers as bayes poison and sends it though a foreign country, it will show as a spammy pair rather than a hammy country code e.g CNGB is spammy because the ordering is wrong.
Comment 1 Henrik Krohns 2011-05-25 07:59:07 UTC
*** Bug 6433 has been marked as a duplicate of this bug. ***
Comment 2 Giovanni Bechis 2018-02-03 11:37:58 UTC
I think this could be useful, IMH more food for bayes is better.
Any opinions ?
Comment 3 RW 2018-02-03 13:38:06 UTC
Created attachment 5522 [details]
Updated patch  for Bayes-specific Relaycountry metadata
Comment 4 Bill Cole 2018-02-04 18:59:49 UTC
(In reply to Giovanni Bechis from comment #2)
> I think this could be useful, IMH more food for bayes is better.
> Any opinions ?

+1
Comment 5 Kevin A. McGrail 2018-02-21 12:19:58 UTC
RW, any chance we can get an ICLA https://www.apache.org/licenses/icla.pdf to consider this patch?
Comment 6 Henrik Krohns 2018-02-21 14:34:32 UTC
Sorry to be a downer, but in the words of Justin Mason, any Bayes modification should go through a https://wiki.apache.org/spamassassin/TenFoldCrossValidation. Long time ago I messed around adding all sorts of tokens and did some 10fcv tests, sometimes results were even worse. So I wouldn't necessarily go claming moar crap the better.
Comment 7 Kevin A. McGrail 2018-02-21 14:36:54 UTC
(In reply to Henrik Krohns from comment #6)
> Sorry to be a downer, but in the words of Justin Mason, any Bayes
> modification should go through a
> https://wiki.apache.org/spamassassin/TenFoldCrossValidation. Long time ago I
> messed around adding all sorts of tokens and did some 10fcv tests, sometimes
> results were even worse. So I wouldn't necessarily go claming moar crap the
> better.

Great point Henrik.  tfcv is standard for me to consider the patch.