Bug 4507 - [review] Add support for URIBL.com
Summary: [review] Add support for URIBL.com
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: 3.2.0
Assignee: SpamAssassin Developer Mailing List
URL: http://www.uribl.com
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-29 15:38 UTC by Henry Stern
Modified: 2006-07-24 02:04 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Stern 2005-07-29 15:38:40 UTC
I'd like to add support for URIBL.com to trunk in anticipation of rescoring for
a 3.2 release.  If we do this early then people who will be contributing logs
for the next scoring run will have data for use with the --reuse option.
Comment 1 Henry Stern 2005-07-29 15:40:03 UTC
From http://www.uribl.com/usage.shtml

urirhssub       URIBL_BLACK  multi.uribl.com.        A   2
body            URIBL_BLACK  eval:check_uridnsbl('URIBL_BLACK')
describe        URIBL_BLACK  Contains an URL listed in the URIBL blacklist
tflags          URIBL_BLACK  net
score           URIBL_BLACK  3.0

urirhssub       URIBL_GREY  multi.uribl.com.        A   4
body            URIBL_GREY  eval:check_uridnsbl('URIBL_GREY')
describe        URIBL_GREY  Contains an URL listed in the URIBL greylist
tflags          URIBL_GREY  net
score           URIBL_GREY  1.0

urirhssub       URIBL_BLACK  multi.uribl.com.        A   2
body            URIBL_BLACK  eval:check_uridnsbl('URIBL_BLACK')
describe        URIBL_BLACK  Contains an URL listed in the URIBL blacklist
tflags          URIBL_BLACK  net
score           URIBL_BLACK  3.0

urirhssub       URIBL_RED  multi.uribl.com.        A   8
body            URIBL_RED  eval:check_uridnsbl('URIBL_RED')
describe        URIBL_RED  Contains an URL listed in the URIBL redlist
tflags          URIBL_RED  net
score           URIBL_RED  1.0
Comment 2 Justin Mason 2005-07-29 16:06:35 UTC
+1

looks good, although the fp rate on the grey list will need a look.  I'd prefer
to set that to something like 0.1 until we have a good idea what it is.
Comment 3 Theo Van Dinter 2005-07-29 16:10:13 UTC
Subject: Re:  Add support for URIBL.com

On Fri, Jul 29, 2005 at 04:06:35PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> looks good, although the fp rate on the grey list will need a look.  I'd prefer
> to set that to something like 0.1 until we have a good idea what it is.

BTW: since the URIBL people don't want to be in the standard distro for now,
we should put them in as 70_uribl.cf or something.  sets up the reuse, but
doesn't get distributed by default.

Comment 4 Chris Santerre 2005-07-30 07:47:08 UTC
Subject: RE:  Add support for URIBL.com

Bah I can't remember my bugzilla password :)

Please do NOT use RED list. It is experimental only. 

Although I guess its fine to test the rates on. 

THanks,

--Chris 

> -----Original Message-----
> From: bugzilla-daemon@bugzilla.spamassassin.org
> [mailto:bugzilla-daemon@bugzilla.spamassassin.org]
> Sent: Friday, July 29, 2005 6:10 PM
> To: dev@spamassassin.apache.org
> Subject: [Bug 4507] Add support for URIBL.com
> 
> 
> http://bugzilla.spamassassin.org/show_bug.cgi?id=4507
> 
> 
> 
> 
> 
> ------- Additional Comments From felicity@apache.org  
> 2005-07-29 16:10 -------
> Subject: Re:  Add support for URIBL.com
> 
> On Fri, Jul 29, 2005 at 04:06:35PM -0700, 
> bugzilla-daemon@bugzilla.spamassassin.org wrote:
> > looks good, although the fp rate on the grey list will need 
> a look.  I'd prefer
> > to set that to something like 0.1 until we have a good idea 
> what it is.
> 
> BTW: since the URIBL people don't want to be in the standard 
> distro for now,
> we should put them in as 70_uribl.cf or something.  sets up 
> the reuse, but
> doesn't get distributed by default.
> 
> 
> 
> 
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> 

Comment 5 Henry Stern 2005-07-30 11:34:30 UTC
Adding         70_uribl.cf
Transmitting file data .
Committed revision 226562.
Comment 6 Chris 2005-08-01 00:51:54 UTC
Stats/Feedback would be welcomed on the uribl-discuss list.

List-Subscribe: http://lists.uribl.com/mailman/listinfo/uribl-discuss
mailto:uribl-discuss-request@lists.maddoc.net?subject=subscribe
List-Post: uribl-discuss@lists.maddoc.net

Thanks.
Comment 7 Justin Mason 2005-08-01 17:09:26 UTC
I've just fixed up those rules --

  - the rule names had no T_ prefix
  - should not have scores above 0.001 while they're in testing

IMO, they should also have been put in 70_testing.cf, but since Theo suggested
70_uribl.cf in the first place, I'll let you off that one ;)
Comment 8 Dallas Engelken 2005-11-09 19:20:29 UTC
i am speaking on behalf of uribl.com when i say you have our blessing to move 
URIBL rules out of 70_testing.cf and into 25_uribl.cf

thanks..
Comment 9 Justin Mason 2005-11-10 00:30:34 UTC
thanks, Dallas.
Comment 10 Dallas Engelken 2005-11-11 17:04:39 UTC
since the milestone on this shows 3.2.0, do you know how far that might be?  
thanks.


Comment 11 Justin Mason 2005-11-11 19:52:45 UTC
heh -- not a clue ;)

with any luck though we can come up with some form of rule-updating via
"sa-update", I think Theo had that planned....
Comment 12 Theo Van Dinter 2005-11-12 03:38:00 UTC
(In reply to comment #11)
> with any luck though we can come up with some form of rule-updating via
> "sa-update", I think Theo had that planned....

We can definitely put up URIBL rules via sa-update as it doesn't require any extra code.  However, there 
aren't currently any plans for how we should use sa-update, how to promote rules into an update, etc.  I 
have a few ideas about how to do this, but it's not quite ready for use yet.  Maybe this is something we can 
get done at AC2005...?
Comment 13 Justin Mason 2005-11-12 04:08:46 UTC
yeah, definitely one of those things to talk about alright ;)

however I would prefer to talk on the list if possible; quite a few of the
interested parties may not be going to AC2k5.
Comment 14 Henry Stern 2006-01-20 21:20:56 UTC
Freqs from buildbot
(http://buildbot.spamassassin.org/ruleqa/ruleqa?daterev=20060114-r366568-n&s_defcorpus=on&rule=%2FT_URIBL_%28BLACK%7CGREY%7CRED%29&s_zero=on&s_detail=checked+&g=Change)

0.00000 	48.6854 	0.0976 	0.998 	0.89 	0.01 	T_URIBL_BLACK 		
0.00000 	1.0975 		0.0488 	0.957 	0.71 	0.01 	T_URIBL_GREY 		
0.00000 	0.0248 		0.0000 	1.000 	0.49 	0.01 	T_URIBL_RED

Using suggested scores from the uribl.com website of 3.0 for black, 1.0 for grey
and 0.0 for red.

Added to active rules in rev. 370897.
Comment 15 Theo Van Dinter 2006-03-07 04:11:29 UTC
this was done already. :)
Comment 16 Theo Van Dinter 2006-03-11 23:10:47 UTC
(In reply to comment #8)
> i am speaking on behalf of uribl.com when i say you have our blessing to move 
> URIBL rules out of 70_testing.cf and into 25_uribl.cf
(In reply to comment #10)
> since the milestone on this shows 3.2.0, do you know how far that might be?  

just an update -- I'm about to commit the new URIBL rules to the 3.1.1 update area, and it'll be available 
in the next update (probably when 3.1.1 is released), ie: in the next couple of days. :)
Comment 17 Daniel Quinlan 2006-03-14 05:08:00 UTC
I am -0.998 against deploying this rule sans a perceptron scoring run.  The FP
rate is not to be triffled with and there is a lot of overlap with the SURBL
rules.  I'm concerned that any SURBL false positive will then instantly become
a message false positive (assuming URIBL.com black is more aggressive) when it
hits both lists.
Comment 18 Sidney Markowitz 2006-03-14 05:26:19 UTC
Does it make sense to deploy the rules with scores of 0.03 for black, 0.01 for
grey, and 0 for red so people have the option of using them if they wish to by
simply rescoring after they have gathererd statistics for themselves?

Comment 19 Justin Mason 2006-03-14 11:06:36 UTC
Dan, we discussed this on the list -- in fact, most FPs on the SURBL and URIBL
rules are *not* due to hits across multiple URIBLs; while that does occur
frequently in spam, it occurs very infrequently in ham.

Sidney -- I'm not keen on that idea.  Spamassassin's design philosophy is to not
require configuration where possible.
Comment 20 Henry Stern 2006-03-14 14:44:19 UTC
Where URIBL_GREY is concerned, Dan has a point.  URIBL_GREY deliberately
contains domains like doubleclick.net and geocities.com and it may indeed cause
some false positives.

I'd be +1 on shipping URIBL_BLACK without a perceptron run but not URIBL_GREY.
Comment 21 Theo Van Dinter 2006-03-14 16:48:23 UTC
(In reply to comment #17)
> I am -0.998 against deploying this rule sans a perceptron scoring run.  The FP
> rate is not to be triffled with and there is a lot of overlap with the SURBL
> rules.  I'm concerned that any SURBL false positive will then instantly become
> a message false positive (assuming URIBL.com black is more aggressive) when it
> hits both lists.

A few thoughts:

These rules are already deployed via 3.1.1 updates.  The scores for the URIBL_* rules aren't any more 
important than the scores for any rules we're making available.  Right now, I've been guessing at scores 
based on hit frequency results, but we need to come up with a better way to set the default scores.  The 
subject of this ticket is about adding in URIBL* rules, which has been done, so we should move to 
another venue (dev@ is suggested) to discuss how to better set default scores/do more frequent score 
generation.
Comment 22 Dallas Engelken 2006-07-24 05:10:26 UTC
is there recent freqs for T_URIBL_RED on buildbot somewhere?  i cant seem to
find it.
Comment 23 Justin Mason 2006-07-24 09:04:39 UTC
yeah -- it got promoted (?) so it's now called just "URIBL_RED":

http://ruleqa.spamassassin.org/20060722-r424538-n/URIBL_RED/detail

(it has a score of 0.001, in the main ruleset, in rules/25_uribl.cf .)