Bug 850 - Improve SUBJECT_FREQ
Summary: Improve SUBJECT_FREQ
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: unspecified
Hardware: Other other
: P2 enhancement
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 933 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-09-07 21:19 UTC by Matthew Cline
Modified: 2002-12-18 21:24 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Cline 2002-09-07 21:19:01 UTC
The SUBJECT_FREQ rule seems likely to run into a lot of false positives,
as I have these matching subjects in my spam corpus:

Subject: Earn 36% monthly through fully secured accounts receivable acquisitions
Subject: This could be your message viewed by millions daily21296
Subject: Refinance and Reduce Monthly Payments                               
Subject: RE: Monthly Cash Deposited-->$16,468 1620CmYu9--9
Subject: (WSCH.OB) Weekly Hot Stock               ATRS
Subject: Monthly income $. 2,500 or more!!!
Subject: Earn $54,420.00 Monthly For Practically Doing NOTHING
Subject: Earn $3 to $5 for each envelope you stuff! $2,000+ weekly.
Subject: Sick of the daily grind?  Getaway now
Subject: Earn $54,420.00 Monthly For Practically Doing NOTHING
Subject: re:  eBay, MAKE A SERIOUS MONTHLY INCOME              5780

It wouldn't be that hard to do if you could match multiple headers in one
header rule (or the same header multiple times); otherwise it would
require meta rules, and that just seems a bit messy.
Comment 1 Daniel Quinlan 2002-09-07 22:41:37 UTC
It's just not a particularly good rule. Our methods for compensating newsletters
leave much to be desired.

OVERALL%   SPAM% NONSPAM%     S/O   SCORE  NAME
  11424     3726     7698    0.33    0.00  (all messages)
100.000   32.616   67.384    0.33    0.00  (all messages as %)
  0.902    0.429    1.130    0.28   -1.92  SUBJECT_FREQ

It's hard to believe that we have a rule in there assigning -1.92 to 0.429% of
my spam.
Comment 2 Daniel Quinlan 2002-09-18 19:49:18 UTC
From bug #933 (a duplicate)

--------------------------------------------------------------------------------

Removed SUBJECT_FREQ from HEAD cvs.

hit frequencies:

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  0.849    0.585    0.901    0.39    0.33    0.00  SUBJECT_FREQ

test code from all files in rules dir:

header SUBJECT_FREQ             Subject =~ /\b(?:monday|daily|weekly|monthly)\b/i
describe SUBJECT_FREQ           Subject contains a frequency - probable newsletter
tflags SUBJECT_FREQ		nice


If you want to re-add this test to SpamAssassin, please follow
up this bug entry, improving the code until the S/O ratio
goes above 0.7 (or below 0.3 for nice tests).

(automated submission)
Comment 3 Daniel Quinlan 2002-09-18 19:49:54 UTC
*** Bug 933 has been marked as a duplicate of this bug. ***
Comment 4 Justin Mason 2002-12-19 06:24:43 UTC
not up to scratch -- dropping