Bug 7354 - Allow custom addrlists for WLBLEval
Summary: Allow custom addrlists for WLBLEval
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.4 SVN branch
Hardware: All All
: P2 enhancement
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-03 12:28 UTC by Steadramon
Modified: 2019-08-01 05:56 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
patch from trunk patch None Steadramon [HasCLA]
patch from trunk patch None Steadramon [HasCLA]
patch from trunk patch None Steadramon [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Steadramon 2016-10-03 12:28:02 UTC
Created attachment 5405 [details]
patch from trunk

On 30/09/16 01:07, Paul Stead wrote:
> Hi guys,
>
> Whilst looking through WLBLEval eval rules I noticed the likes of the
> following:
>
> -- 
>   $self->register_eval_rule("check_from_in_list");
>   $self->register_eval_rule("check_to_in_list");
> -- 
>
> Cool! I can check from and to lists, except I can't find a way to
> add/remove to addrlists like we can by using enlist_uri_host for hosts.
>

This patch adds a new directive "enlist_addrlist" to Conf.pm - after asking on the Dev list I got no answer as to whether this new directive would be best here or within WLBLEval.pm itself.

I've also added a patch to WLBLEval to add the eval check_replyto_in_list which compliments the patch to Conf.pm

Example new rules which could come from this:

---8<---
enlist_addrlist (SUSTLDS) *@*.top
enlist_addrlist (SUSTLDS) *@*.download
enlist_addrlist (SUSTLDS) *@*.science

header __FROM_SUS_TLD eval:check_from_in_list('SUSTLDS') 
---8<---

Another example for something posted to the SA list earlier today:

---8<---
enlist_addrlist (PAYPAL_DOMAINS) *@paypal.com *@paypal.co.uk
header __FROM_PAYPAL eval:check_from_in_list('PAYPAL_DOMAINS')
---8<---

Thoughts?
Comment 1 Steadramon 2016-10-03 15:59:15 UTC
Created attachment 5406 [details]
patch from trunk

Fix paths and slight adjust on readme
Comment 2 Steadramon 2016-10-03 16:01:05 UTC
Created attachment 5407 [details]
patch from trunk

*actually* adjusted readme text
Comment 3 Steadramon 2017-02-22 00:02:18 UTC
(In reply to Steadramon from comment #0)

> Another example for something posted to the SA list earlier today:
> 
> ---8<---
> enlist_addrlist (PAYPAL_DOMAINS) *@paypal.com *@paypal.co.uk
> header __FROM_PAYPAL eval:check_from_in_list('PAYPAL_DOMAINS')
> ---8<---

The person wasn't interested if the email was validly from PayPal, just that the fingerprint was there
Comment 4 Kevin A. McGrail 2018-08-26 22:07:14 UTC
How would this change help you?  What is the goal?  What's a rule(s) that uses it?
Comment 5 Steadramon 2018-08-27 07:05:09 UTC
(In reply to Kevin A. McGrail from comment #4)
> How would this change help you?  What is the goal?  What's a rule(s) that
> uses it?

Hi,

I find this extra function quite handy when adding rules which match headers or body uris, quite a bit easier to set up and cleaner looking that a regex expression.

I have reffered to this BZ a few times on the mailing list:

http://spamassassin.1065346.n5.nabble.com/Blacklist-for-reply-to-td150498.html
http://spamassassin.1065346.n5.nabble.com/Ends-with-string-td139774.html
http://spamassassin.1065346.n5.nabble.com/top-and-other-spammy-TLDs-td124063.html

I have mostly used this to have a list of suspicious TLDs example in first commit shows :

enlist_addrlist (SUSTLDS) *@*.top
enlist_addrlist (SUSTLDS) *@*.download
enlist_addrlist (SUSTLDS) *@*.science

header __FROM_SUS_TLD eval:check_from_in_list('SUSTLDS') 

In KAM.cf we have a similar rule:

header          __KAM_SOMETLD_ARE_BAD_TLD_FROM          From:addr =~ /\.(pw|stream|trade|bid|press|top|date)$/i

---

When this comes to matching more complex patterns in an email address I've seen people on the mailing list with long regexes where a more simple enlist_addrlist could be used, an example I have seen, but can't find at the moment is where someone was wanting to match their own domains, with several TLD variants...

enlist_addrlist (MYDOMAIN) *@example.com
enlist_addrlist (MYDOMAIN) *@example.org
enlist_addrlist (MYDOMAIN) *@example.co.uk

header __FROM_MYDOMAIN eval:check_from_in_list('MYDOMAIN') 
header __REPLYTO_MYDOMAIN eval:check_replyto_in_list('MYDOMAIN') 

---

I don't think it totally replaces From:addr regex but I do think adds a completeness to the WLBLEvals eval ruleset and add an easily understandable, readable set of features for SA.
Comment 6 Steadramon 2018-08-27 07:06:05 UTC
Just to note this BZ adds both enlist_addrlist and eval:check_replyto_in_list
Comment 7 Steadramon 2018-08-27 07:09:39 UTC
Another example on the SA user list which has a "messy" regex which would be a lot easier to create, maintain and read in the future:

http://mail-archives.apache.org/mod_mbox/spamassassin-users/201711.mbox/%3Calpine.LNX.2.00.1711271736050.30613@athena.impsec.org%3E

---8<---

header     FROM_RARE_TLD    From:addr =~ /\.(?:wor(?:k|ld)|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online|biz|host|loan|study|click|stream|xxx|date)$/i
describe   FROM_RARE_TLD    From address in rarely-nonspam TLD

---8<---
Comment 8 Kevin A. McGrail 2018-08-27 20:09:31 UTC
Paul, there are three patches attached.  Are the 1st two obsoleted by the 3rd?
Comment 9 John Hardin 2018-08-27 21:18:10 UTC
(In reply to Steadramon from comment #7)
> Another example on the SA user list which has a "messy" regex which would be
> a lot easier to create, maintain and read in the future:
> 
> http://mail-archives.apache.org/mod_mbox/spamassassin-users/201711.mbox/
> %3Calpine.LNX.2.00.1711271736050.30613@athena.impsec.org%3E
> 
> ---8<---
> 
> header     FROM_RARE_TLD    From:addr =~
> /\.(?:wor(?:
> k|ld)|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|d
> ownload|top|global|(?:
> web)?site|tech|party|pro|bid|trade|win|moda|news|online|biz|host|loan|study|c
> lick|stream|xxx|date)$/i
> describe   FROM_RARE_TLD    From address in rarely-nonspam TLD
> 
> ---8<---

Any comparative performance data for the different approaches?
Comment 10 Kevin A. McGrail 2018-08-28 02:49:46 UTC
Never mind, I see I had show obsolete turned on.  

Adding this patch but Paul, can you add more examples to the man page, please?  Marking it as a blocker to get that done for 3.4.2

I'll update KAM.cf to use this eval too after it's in 3.4.2

Also add this to your list of potential rules to add to rulesrc like bug 7606.

3.4:
Committed revision 1839390.
trunk:
Committed revision 1839391.
Comment 11 Kevin A. McGrail 2018-09-04 14:26:20 UTC
Pushing to 3.4.3 where Paul can commit this himself :-)
Comment 12 Henrik Krohns 2018-09-26 16:13:17 UTC
Isn't 3.4 supposed to be bug fixes only?

Also I really don't like the whole (GROUP) syntax.
Comment 13 Steadramon 2018-09-26 16:15:20 UTC
(In reply to Henrik Krohns from comment #12)
> Isn't 3.4 supposed to be bug fixes only?
> 
> Also I really don't like the whole (GROUP) syntax.

The (GROUP) syntax follows the same syntax that was already established in this plugin
Comment 14 Kevin A. McGrail 2018-09-26 16:19:12 UTC
3.4.3 is for bug fixes, yes.  The plugin is in 3.4.2 so I think it's ok for 3.4.3.
Comment 15 Henrik Krohns 2018-09-26 16:19:43 UTC
(In reply to Steadramon from comment #13)
> (In reply to Henrik Krohns from comment #12)
> > Isn't 3.4 supposed to be bug fixes only?
> > 
> > Also I really don't like the whole (GROUP) syntax.
> 
> The (GROUP) syntax follows the same syntax that was already established in
> this plugin

I don't understand. I don't see anything like that.

There is call for unifying all these addrlist stuff, so I vote to keep all enhacements to 4.0 and think about the formats carefully. There are already so many "internal regex addrlists" around that I'd also like to optimize the lookups etc.
Comment 16 Steadramon 2018-09-26 16:26:23 UTC
> I don't understand. I don't see anything like that.


Apologies - it replicates the enlist_uri_host syntax
Comment 17 Henrik Krohns 2018-09-26 16:31:54 UTC
(In reply to Steadramon from comment #16)
> > I don't understand. I don't see anything like that.
> 
> 
> Apologies - it replicates the enlist_uri_host syntax

Thanks, I had no idea about that. I guess it's too late to change the syntax. :-) Doesn't matter.

I'll look a bit into these.
Comment 18 Henrik Krohns 2019-06-15 19:49:47 UTC
Isn't this already committed? Targeting 4.0.0, add_to_addrlist() should be optimized to use hashes when possible..
Comment 19 Henrik Krohns 2019-08-01 05:56:07 UTC
Addrlists work fast enough with Revision 1862688, closing.