Bug 913 - INVALID_MSGID low-performing rule pruned
Summary: INVALID_MSGID low-performing rule pruned
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P4 trivial
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 824 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-09-17 09:37 UTC by Justin Mason
Modified: 2002-09-24 18:37 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2002-09-17 09:37:01 UTC
Removed INVALID_MSGID from HEAD cvs.

hit frequencies:

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  4.745    8.776    3.958    0.69    0.37    0.00  INVALID_MSGID

test code from all files in rules dir:

header INVALID_MSGID		Message-Id !~ /^<(?:[a-zA-Z0-9.!\#$%&'*\+\/=?\^_{}|~-]+|\".+\")\@(?:[a-zA-Z0-9.-]+|\[\d{1,3}(?:\.\d{1,3}){3}\])>(?:\s*\(.*\))?\s*$/ [if-unset: <NO@MSGID>]
describe INVALID_MSGID		Message-Id is not valid, according to RFC 2822
lang de describe INVALID_MSGID            Message-Id ist laut RFC-2822 nicht gueltig
lang es describe INVALID_MSGID		Message-Id no válido, de acuerdo al RFC-2822
lang fr describe INVALID_MSGID  L'entête Message-ID: ne suit pas la norme RFC-2822
lang pl describe INVALID_MSGID		Message-Id jest nie zgodne ze standardem RFC2822


If you want to re-add this test to SpamAssassin, please follow
up this bug entry, improving the code until the S/O ratio
goes above 0.7 (or below 0.3 for nice tests).

(automated submission)
Comment 1 Daniel Quinlan 2002-09-18 19:55:37 UTC
info from bug 824:

--------------------------------------------------------------------------------

Unfortunately, backslashing the dollar sign kills the effectiveness of the
rule.  Before just fixing this, someone with a large corpus should really
try to figure out exactly what is going on and which characters are really
allowed.

Only mess with this in HEAD, I think.

------- Additional Comments From Albert Meltzer 2002-09-11 10:50 -------

The fix seems to be as follows: change

[a-zA-Z0-9.!\#$%&'*\+\/=?\^_{}|~-]

to

[-a-zA-Z0-9.!\#%&'*\+\/=?\^_{}|~$]

'$' only seems work when at the end of the set; however, "~-$" would be 
considered a range, and so '-' is moved to the front. I tested this with 5.6.1.



------- Additional Comments From Daniel Quinlan 2002-09-11 12:06 -------

Subject: Re:  INVALID_MSGID - dollar sign in rule is not backslashed

> '$' only seems work when at the end of the set; however, "~-$" would
> be considered a range, and so '-' is moved to the front. I tested
> this with 5.6.1.

You can just backslash the $.  That works fine.
Comment 2 Daniel Quinlan 2002-09-18 19:55:51 UTC
*** Bug 824 has been marked as a duplicate of this bug. ***
Comment 3 Malte S. Stretz 2002-09-25 02:37:24 UTC
These characters are allowed in Message-Ids, according to RFC 2822: 
 
atext           =       ALPHA / DIGIT / ; Any character except controls, 
                        "!" / "#" /     ;  SP, and specials. 
                        "$" / "%" /     ;  Used for atoms 
                        "&" / "'" / 
                        "*" / "+" / 
                        "-" / "/" / 
                        "=" / "?" / 
                        "^" / "_" / 
                        "`" / "{" / 
                        "|" / "}" / 
                        "~" 
 
I escaped the dollar sign, so this rule should be fixed now.