Bug 6183 - ISO-2022-JP false positives on FM_FRM_RN_L_BRACK
Summary: ISO-2022-JP false positives on FM_FRM_RN_L_BRACK
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3.0
Hardware: Other All
: P2 normal
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-20 18:04 UTC by Warren Togami
Modified: 2009-09-03 14:21 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
mbox containing 4 samples showing FM_FRM_RN_L_BRACK bug application/mbox None Warren Togami [HasCLA]
Example ISO-2022-JP "From" that does not trigger FM_FRM_RN_L_BRACK message/rfc822 None Warren Togami [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Warren Togami 2009-08-20 18:04:44 UTC
FM_FRM_RN_L_BRACK is described as having a > without < in From.

It seems some legitimate ISO-2022-JP mail can trigger this rule in error.  Attaching a few examples.
Comment 1 Warren Togami 2009-08-20 18:11:18 UTC
Created attachment 4522 [details]
mbox containing 4 samples showing FM_FRM_RN_L_BRACK bug
Comment 2 Justin Mason 2009-08-21 04:19:04 UTC
great! keep 'em coming ;)
Comment 3 Justin Mason 2009-08-31 16:06:15 UTC
if we want to change this for 3.3.0, it needs to be in SVN by this Thursday; see bug 6155.
Comment 4 Warren Togami 2009-08-31 17:06:02 UTC
describe FM_FRM_RN_L_BRACK	From name has > but not <

I have access to 13 of the 45 spam hits of FM_FRM_RN_L_BRACK in my own corpus.  They are confirmed spam, but they are all Japanese ISO-2022-JP without broken brackets as the rule is described.

Are the other spam hits of this rule Japanse ISO-2022-JP without broken brackets as well?
Comment 5 Warren Togami 2009-08-31 17:11:22 UTC
Created attachment 4525 [details]
Example ISO-2022-JP "From" that does not trigger FM_FRM_RN_L_BRACK
Comment 6 Warren Togami 2009-08-31 18:26:16 UTC
rulesrc/sandbox/emailed/00_FVGT_File001.cf

header   __FROM_LEFT_BRACK      From:name =~ /</
header   __FROM_RIGH_BRACK      From:name =~ />/
meta     FM_FRM_RN_L_BRACK      (__FROM_RIGH_BRACK && !__FROM_LEFT_BRACK)
describe FM_FRM_RN_L_BRACK      From name has > but not <

__FROM_LEFT_BRACK is somehow broken?  Any ideas?

If we can't fix this, perhaps we are better off disabling this rule.  All of the ham and spam in my corpus that triggers FM_FRM_RN_L_BRACK show that the rule is incorrect.  This rule isn't identifying spam.  It is identifying a certain subset of ISO-2022-JP Japanese mail.  In the past all Japanese mail might have been in the spam corpus, without ham samples, so we didn't notice this problem.
Comment 7 AXB 2009-08-31 23:40:24 UTC
(In reply to comment #6)
> rulesrc/sandbox/emailed/00_FVGT_File001.cf
> 
> header   __FROM_LEFT_BRACK      From:name =~ /</
> header   __FROM_RIGH_BRACK      From:name =~ />/
> meta     FM_FRM_RN_L_BRACK      (__FROM_RIGH_BRACK && !__FROM_LEFT_BRACK)
> describe FM_FRM_RN_L_BRACK      From name has > but not <
> 
> __FROM_LEFT_BRACK is somehow broken?  Any ideas?
> 
> If we can't fix this, perhaps we are better off disabling this rule.  All of
> the ham and spam in my corpus that triggers FM_FRM_RN_L_BRACK show that the
> rule is incorrect.  This rule isn't identifying spam.  It is identifying a
> certain subset of ISO-2022-JP Japanese mail.  In the past all Japanese mail
> might have been in the spam corpus, without ham samples, so we didn't notice
> this problem.

May I suggest:

header   __FROM_LEFT_BRACK      From:name =~ /^</
header   __FROM_RIGH_BRACK      From:name =~ />$/
meta     FM_FRM_RN_L_BRACK      (__FROM_RIGH_BRACK && !__FROM_LEFT_BRACK)
describe FM_FRM_RN_L_BRACK      From name has > but not <

comments?
Comment 8 Warren Togami 2009-09-01 17:31:32 UTC
http://ruleqa.spamassassin.org/20090831-r809502-n/
http://ruleqa.spamassassin.org/20090901-r809894-n
FM_FRM_RN_L_BRACK disappeared between these two masscheck runs.  What happened?
Comment 9 Warren Togami 2009-09-01 17:37:21 UTC
Bug #5201 and Bug #6082 are the same issue.  I now understand this isn't the address portion of From but the free string of the name which is usually prior to the address.

Due to the rule disappearing in masscheck and my poor understanding of this code I am unable to test the suggested rule in Comment #7.
Comment 10 Justin Mason 2009-09-03 14:21:03 UTC
(In reply to comment #7)
> May I suggest:
> 
> header   __FROM_LEFT_BRACK      From:name =~ /^</
> header   __FROM_RIGH_BRACK      From:name =~ />$/
> meta     FM_FRM_RN_L_BRACK      (__FROM_RIGH_BRACK && !__FROM_LEFT_BRACK)
> describe FM_FRM_RN_L_BRACK      From name has > but not <
> 
> comments?

unfortunately that misses the two good hits I have in my corpus.  the > is halfway through the line.

here's a fix:


: 45...; svn commit -m "bug 6183: avoid ISO-2022-JP FPs on FM_FRM_RN_L_BRACK rule"
Sending        rulesrc/sandbox/emailed/00_FVGT_File001.cf
Adding         t.rules/FM_FRM_RN_L_BRACK
Adding         t.rules/FM_FRM_RN_L_BRACK/bug6183_hit1
Adding         t.rules/FM_FRM_RN_L_BRACK/bug6183_hit2
Adding         t.rules/FM_FRM_RN_L_BRACK/fp_bug6183_att4522_1
Adding         t.rules/FM_FRM_RN_L_BRACK/fp_bug6183_att4522_2
Adding         t.rules/FM_FRM_RN_L_BRACK/fp_bug6183_att4522_3
Adding         t.rules/FM_FRM_RN_L_BRACK/fp_bug6183_att4522_4
Adding         t.rules/FM_FRM_RN_L_BRACK/fp_bug6183_att4525
Transmitting file data ........
Committed revision 811129.