SA Bugzilla – Bug 4507
[review] Add support for URIBL.com
Last modified: 2006-07-24 02:04:39 UTC
I'd like to add support for URIBL.com to trunk in anticipation of rescoring for a 3.2 release. If we do this early then people who will be contributing logs for the next scoring run will have data for use with the --reuse option.
From http://www.uribl.com/usage.shtml urirhssub URIBL_BLACK multi.uribl.com. A 2 body URIBL_BLACK eval:check_uridnsbl('URIBL_BLACK') describe URIBL_BLACK Contains an URL listed in the URIBL blacklist tflags URIBL_BLACK net score URIBL_BLACK 3.0 urirhssub URIBL_GREY multi.uribl.com. A 4 body URIBL_GREY eval:check_uridnsbl('URIBL_GREY') describe URIBL_GREY Contains an URL listed in the URIBL greylist tflags URIBL_GREY net score URIBL_GREY 1.0 urirhssub URIBL_BLACK multi.uribl.com. A 2 body URIBL_BLACK eval:check_uridnsbl('URIBL_BLACK') describe URIBL_BLACK Contains an URL listed in the URIBL blacklist tflags URIBL_BLACK net score URIBL_BLACK 3.0 urirhssub URIBL_RED multi.uribl.com. A 8 body URIBL_RED eval:check_uridnsbl('URIBL_RED') describe URIBL_RED Contains an URL listed in the URIBL redlist tflags URIBL_RED net score URIBL_RED 1.0
+1 looks good, although the fp rate on the grey list will need a look. I'd prefer to set that to something like 0.1 until we have a good idea what it is.
Subject: Re: Add support for URIBL.com On Fri, Jul 29, 2005 at 04:06:35PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > looks good, although the fp rate on the grey list will need a look. I'd prefer > to set that to something like 0.1 until we have a good idea what it is. BTW: since the URIBL people don't want to be in the standard distro for now, we should put them in as 70_uribl.cf or something. sets up the reuse, but doesn't get distributed by default.
Subject: RE: Add support for URIBL.com Bah I can't remember my bugzilla password :) Please do NOT use RED list. It is experimental only. Although I guess its fine to test the rates on. THanks, --Chris > -----Original Message----- > From: bugzilla-daemon@bugzilla.spamassassin.org > [mailto:bugzilla-daemon@bugzilla.spamassassin.org] > Sent: Friday, July 29, 2005 6:10 PM > To: dev@spamassassin.apache.org > Subject: [Bug 4507] Add support for URIBL.com > > > http://bugzilla.spamassassin.org/show_bug.cgi?id=4507 > > > > > > ------- Additional Comments From felicity@apache.org > 2005-07-29 16:10 ------- > Subject: Re: Add support for URIBL.com > > On Fri, Jul 29, 2005 at 04:06:35PM -0700, > bugzilla-daemon@bugzilla.spamassassin.org wrote: > > looks good, although the fp rate on the grey list will need > a look. I'd prefer > > to set that to something like 0.1 until we have a good idea > what it is. > > BTW: since the URIBL people don't want to be in the standard > distro for now, > we should put them in as 70_uribl.cf or something. sets up > the reuse, but > doesn't get distributed by default. > > > > > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. >
Adding 70_uribl.cf Transmitting file data . Committed revision 226562.
Stats/Feedback would be welcomed on the uribl-discuss list. List-Subscribe: http://lists.uribl.com/mailman/listinfo/uribl-discuss mailto:uribl-discuss-request@lists.maddoc.net?subject=subscribe List-Post: uribl-discuss@lists.maddoc.net Thanks.
I've just fixed up those rules -- - the rule names had no T_ prefix - should not have scores above 0.001 while they're in testing IMO, they should also have been put in 70_testing.cf, but since Theo suggested 70_uribl.cf in the first place, I'll let you off that one ;)
i am speaking on behalf of uribl.com when i say you have our blessing to move URIBL rules out of 70_testing.cf and into 25_uribl.cf thanks..
thanks, Dallas.
since the milestone on this shows 3.2.0, do you know how far that might be? thanks.
heh -- not a clue ;) with any luck though we can come up with some form of rule-updating via "sa-update", I think Theo had that planned....
(In reply to comment #11) > with any luck though we can come up with some form of rule-updating via > "sa-update", I think Theo had that planned.... We can definitely put up URIBL rules via sa-update as it doesn't require any extra code. However, there aren't currently any plans for how we should use sa-update, how to promote rules into an update, etc. I have a few ideas about how to do this, but it's not quite ready for use yet. Maybe this is something we can get done at AC2005...?
yeah, definitely one of those things to talk about alright ;) however I would prefer to talk on the list if possible; quite a few of the interested parties may not be going to AC2k5.
Freqs from buildbot (http://buildbot.spamassassin.org/ruleqa/ruleqa?daterev=20060114-r366568-n&s_defcorpus=on&rule=%2FT_URIBL_%28BLACK%7CGREY%7CRED%29&s_zero=on&s_detail=checked+&g=Change) 0.00000 48.6854 0.0976 0.998 0.89 0.01 T_URIBL_BLACK 0.00000 1.0975 0.0488 0.957 0.71 0.01 T_URIBL_GREY 0.00000 0.0248 0.0000 1.000 0.49 0.01 T_URIBL_RED Using suggested scores from the uribl.com website of 3.0 for black, 1.0 for grey and 0.0 for red. Added to active rules in rev. 370897.
this was done already. :)
(In reply to comment #8) > i am speaking on behalf of uribl.com when i say you have our blessing to move > URIBL rules out of 70_testing.cf and into 25_uribl.cf (In reply to comment #10) > since the milestone on this shows 3.2.0, do you know how far that might be? just an update -- I'm about to commit the new URIBL rules to the 3.1.1 update area, and it'll be available in the next update (probably when 3.1.1 is released), ie: in the next couple of days. :)
I am -0.998 against deploying this rule sans a perceptron scoring run. The FP rate is not to be triffled with and there is a lot of overlap with the SURBL rules. I'm concerned that any SURBL false positive will then instantly become a message false positive (assuming URIBL.com black is more aggressive) when it hits both lists.
Does it make sense to deploy the rules with scores of 0.03 for black, 0.01 for grey, and 0 for red so people have the option of using them if they wish to by simply rescoring after they have gathererd statistics for themselves?
Dan, we discussed this on the list -- in fact, most FPs on the SURBL and URIBL rules are *not* due to hits across multiple URIBLs; while that does occur frequently in spam, it occurs very infrequently in ham. Sidney -- I'm not keen on that idea. Spamassassin's design philosophy is to not require configuration where possible.
Where URIBL_GREY is concerned, Dan has a point. URIBL_GREY deliberately contains domains like doubleclick.net and geocities.com and it may indeed cause some false positives. I'd be +1 on shipping URIBL_BLACK without a perceptron run but not URIBL_GREY.
(In reply to comment #17) > I am -0.998 against deploying this rule sans a perceptron scoring run. The FP > rate is not to be triffled with and there is a lot of overlap with the SURBL > rules. I'm concerned that any SURBL false positive will then instantly become > a message false positive (assuming URIBL.com black is more aggressive) when it > hits both lists. A few thoughts: These rules are already deployed via 3.1.1 updates. The scores for the URIBL_* rules aren't any more important than the scores for any rules we're making available. Right now, I've been guessing at scores based on hit frequency results, but we need to come up with a better way to set the default scores. The subject of this ticket is about adding in URIBL* rules, which has been done, so we should move to another venue (dev@ is suggested) to discuss how to better set default scores/do more frequent score generation.
is there recent freqs for T_URIBL_RED on buildbot somewhere? i cant seem to find it.
yeah -- it got promoted (?) so it's now called just "URIBL_RED": http://ruleqa.spamassassin.org/20060722-r424538-n/URIBL_RED/detail (it has a score of 0.001, in the main ruleset, in rules/25_uribl.cf .)