Bug 668 - Some rules
Summary: Some rules
Status: RESOLVED INVALID
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P4 enhancement
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-08-07 10:17 UTC by Tobias v. Koch
Modified: 2002-08-22 20:08 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias v. Koch 2002-08-07 10:17:49 UTC
Some rules. Could somebody with a big corpus test these?

uri LINK_TO_EXE     /\.exe$/i
describe LINK_TO_EXE    Contains link to a windows executable
test LINK_TO_EXE ok http://cut/member46/test-zugang/free-sexsoftware.exe

body NIGERIAN_SCAM_ATTN /^ATTN:.{0,20}\/C\W{0,3}E\W{0,3}O/mi
describe NIGERIAN_SCAM_ATTN Frequent introduction in Nigerian scam
test NIGERIAN_SCAM_ATTN ok Attn:managing director/CEO
test NIGERIAN_SCAM_ATTN ok ATTN: DIRECTOR/C.E.O

body LIVE_SOMETHING_CAPS /\bLIVE\s*[A-Z]{3,}\b/
describe LIVE_SOMETHING_CAPS Talks about LIVE[...] in all caps
test LIVE_SOMETHING_CAPS ok zur geilsten LIVE LESBEN SHOW???
test LIVE_SOMETHING_CAPS fail live on CNN
Comment 1 Theo Van Dinter 2002-08-07 11:41:24 UTC
Subject: Re: [SAdev]  New: Some rules

On Wed, Aug 07, 2002 at 10:17:49AM -0700, bugzilla-daemon@hughes-family.org wrote:
> Some rules. Could somebody with a big corpus test these?
> 
> uri LINK_TO_EXE     /\.exe$/i
> describe LINK_TO_EXE    Contains link to a windows executable
> body NIGERIAN_SCAM_ATTN /^ATTN:.{0,20}\/C\W{0,3}E\W{0,3}O/mi
> describe NIGERIAN_SCAM_ATTN Frequent introduction in Nigerian scam
> body LIVE_SOMETHING_CAPS /\bLIVE\s*[A-Z]{3,}\b/
> describe LIVE_SOMETHING_CAPS Talks about LIVE[...] in all caps

OVERALL     SPAM  NONSPAM     S/O   SCORE  NAME
  13027     4446     8581    0.34    0.00  (all messages)
     19       19        0    1.00    1.00  LINK_TO_EXE
     16       14        2    0.93    1.00  LIVE_SOMETHING_CAPS
      9        9        0    1.00    1.00  NIGERIAN_SCAM_ATTN

Comment 2 Rod Begbie 2002-08-07 14:56:21 UTC
OVERALL     SPAM  NONSPAM     S/O   SCORE  NAME
  11744     3414     8330    0.29    0.00  (all messages)
    101       38       63    0.60    1.00  LIVE_SOMETHING_CAPS
     32        5       27    0.31    1.00  LINK_TO_EXE

NIGERIAN_SCAM_ATTN didn't trigger.

LINK_TO_EXE hit in my nonspam on a lot of software release announcements.  
(eg "get it now:  http://www.ephpod.com/ephpod240.exe")

LIVE_SOMETHING_CAPS hit on CD new-release newsletters. (eg "322114 R ASIA - 
LIVE AT BUDOKAN CD 8.99")
Comment 3 Tobias v. Koch 2002-08-08 02:00:04 UTC
Subject: Re:  Some rules

BDFO> ------- Additional Comments From rOD-spamassassin@arsecandle.org 
BDFO> 2002-08-07 14:56 ------- OVERALL     SPAM  NONSPAM     S/O   SCORE
BDFO>  NAME
BDFO>   11744     3414     8330    0.29    0.00  (all messages)
BDFO>     101       38       63    0.60    1.00  LIVE_SOMETHING_CAPS
BDFO>      32        5       27    0.31    1.00  LINK_TO_EXE
BDFO> 
BDFO> NIGERIAN_SCAM_ATTN didn't trigger.

Hmm, doesn't sound very good. Better INVALID this bug.

Comment 4 Michael Moncur 2002-08-22 04:24:09 UTC
These didn't fare too well on my corpus.

OVERALL     SPAM  NONSPAM     S/O   SCORE  NAME
  12121     7739     4382    0.64    0.00  (all messages)
     34       33        1    0.95    1.00  LINK_TO_EXE
     10       10        0    1.00    1.00  NIGERIAN_SCAM_ATTN
     52       42       10    0.70    1.00  LIVE_SOMETHING_CAPS
Comment 5 Justin Mason 2002-08-22 04:49:10 UTC
Guys -- thanks a million for doing rule-QA on these.  But would it
be possible to use "hit-frequencies -x -p"?  the extra stats,
and normalisation to percentages, makes it easier to compare
the results across corpora.
Comment 6 Michael Moncur 2002-08-22 05:12:42 UTC
Will do in future. Is there some place where the hit-frequencies options are 
documented? I've seen people posting percentages but had no idea how to do so.
Comment 7 Justin Mason 2002-08-22 05:26:06 UTC
er, no, just in the script itself :(
Comment 8 Craig Hughes 2002-08-23 04:08:47 UTC
Is there a way to get hit-frequencies to give the %ages *and* the raw numbers? 
Percentages are nice, but the raw numbers give a much better sense of
significance to the percentages.