Bug 6675 - HTML_TITLE_SUBJ_DIFF hits subject "<3"
Summary: HTML_TITLE_SUBJ_DIFF hits subject "<3"
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.4.0
Hardware: All All
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-13 17:17 UTC by Darxus
Modified: 2019-06-19 15:01 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Darxus 2011-10-13 17:17:57 UTC
I just got a false positive on an email largely because its entire subject was "<3" (ascii heart), which hits HTML_TITLE_SUBJ_DIFF, which has a score of 2.2.

Doesn't seem like it should have hit that rule.


In 20_html_tests.cf:
meta HTML_TITLE_SUBJ_DIFF      __HTML_TITLE_SUBJ_DIFF && !__MIME_ATTACHMENT
body __HTML_TITLE_SUBJ_DIFF    eval:html_title_subject_ratio('3.5')

No description.

html_title_subject_ratio seems to come from the HTMLEval plugin, lacking a man page ( Mail::SpamAssassin::Plugin::HTMLEval ).  Looks like it's converting "<3" to html, coming up with a result of an empty string by ignoring the fact that there is no ">", and deciding that because the length ratio of an empty string to the original 2 character string is bad.  

http://ruleqa.spamassassin.org/?daterev=20111008-r1180336-n&rule=HTML_TITLE_SUBJ_DIFF&srcpath=&g=Change
The ham to spam ratio of this rule is terrible.  Why is it in the default rule set, and why does it have such a high score?
Comment 1 Dave Jones 2018-01-28 19:28:13 UTC
This rule currently has a score of 1.15 so it's possible that this has been corrected in the past 6 years.
Comment 2 Henrik Krohns 2019-06-19 15:01:55 UTC
Closing old bug. Works fine here.