Bug 5887 - FP FB_CIALIS_LEO3 matches "FINANCIAL is"
Summary: FP FB_CIALIS_LEO3 matches "FINANCIAL is"
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.2.4
Hardware: Other All
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-19 03:05 UTC by Sidney Markowitz
Modified: 2008-04-24 13:58 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Sidney Markowitz 2008-04-19 03:05:05 UTC
I got an FP on this rule in some email from a company named something Financial with a line like "Soandso Financial is committed to ..."

I would just go ahead and change the rule to not match anything beginning with financial, but I'm not enough of a perl regexp expert to be confident of what the cleanest way to do it is. The rule already begins with /(?!CIALIS)C to not match a non-fuzzy CIALIS. What would be the right way to also exclude FINANCIAL?
Comment 1 Sidney Markowitz 2008-04-21 04:00:14 UTC
I was sloppy in reporting this bug. The rule requires the initial C to be upper case, and indeed the FP was on a sentence that began "E*TRADE FINANCIAL is".

I changed the summary to be more accurate.

Someone sent a letter to sa-dev complaining about an FP on this rule matching the word Catalise, which a Google search reveals is the name of some companies and a commonly appearing spelling variant of the Portuguese word for catalyst.
Comment 2 Sidney Markowitz 2008-04-23 14:24:51 UTC
I tested what would happen if I put a \b before and after the existing rule, which would eliminate these two FPs. I would like opinions whether the result is worth puting in. On last nights mass check the test rule eliminates 7 out of the 16 FPs while hitting 158 fewer spams (out of 19708). That gives it a better S/O of 0.991 instead of 0.9845.

I added a rule to allow looking at the hits that are eliminated by the rule change. Looking at its details, it appears that most of the FNs for the new rule are high scoring spam anyway.

So any opinions as to whether I should change the rule to require word boundaries?

SPAM%(of 1165303)  HAM%(of 61395)  S/O% RANK NAME

1.6776 (19549)     0.0147 (9)     0.991 0.90 T_SIDNEY_FB_CIALIS_LEO3 	

1.6912 (19708) 	   0.0261 (16)    0.985 0.89 FB_CIALIS_LEO3 	

0.0136 (158)       0.0114 (7)     0.545 0.56 T_SIDNEY_FB_CIAL_NONWORD
Comment 3 Justin Mason 2008-04-24 01:13:42 UTC
+1, go ahead and replace it.
Comment 4 Sidney Markowitz 2008-04-24 13:42:36 UTC
Committed to update channel (branches/3.2/72_active.cf) 651411

Is that the only place I need to change it? Does the version in
rules/trunk/sandbox/emailed/00_FVGT_File001.cf get automatically promoted somehow?


Comment 5 Sidney Markowitz 2008-04-24 13:58:27 UTC
I see that it has to be done in the sandbox too in order to get into the trunk:

rules/trunk/sandbox/emailed/00_FVGT_File001.cf
Committed revision 651418.