Bug 7748 - TVD_APPROVED too loose
Summary: TVD_APPROVED too loose
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-26 19:17 UTC by Stuart P. Bentley
Modified: 2019-08-26 22:45 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Stuart P. Bentley 2019-08-26 19:17:08 UTC
I had an important travel confirmation email filtered as spam under the TVD_APPROVED rule for including these lines:

>> you will receive a separate email receipt after you are charged.
>>
>> TripIt Approved

The `/you.{1,2}re .{0,20}approved/i` test in https://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/felicity/70_other.cf could probably be tightened to avoid false positives like this: maybe use `[^\n]{0,20}` instead, so the test won't span unrelated lines?
Comment 1 Bill Cole 2019-08-26 21:36:31 UTC
Normally rule FPs are not considered bugs per se, since they it is expected that spam rules will match on some ham and nice rules will match on some spam, with the *AGGREGATE* scores being what matters. In principle, rules get rescored algorithmically based on their quality as reported in masscheck reports. HOWEVER, this rule inexplicably has been pegged at fairly high scores for almost 10 years in 50_scores.cf, immune to RuleQA:

score TVD_APPROVED 2.356 2.599 2.599 2.090 # n=2

Current and recent RuleQA results cannot justify that, so I have removed the fixed scores in r1865956. I expect the rule will be scored down on the next rescoring run. 


Modifying the rule to not span lines would make it not match much of the spam that it has matched in the past, so it is not clear that there would be any point to a modified rule.
Comment 2 RW 2019-08-26 22:45:02 UTC
[^\n]{0,20} wont make any difference as body rules run against individual paragraphs that have been flattened into a single line.

Presumably the '>>' quoting, or something similar, was in the original. Otherwise it would be two paragraphs.