Bug 7192

Summary: US Dollars rules FP
Product: Spamassassin Reporter: Alex <mysqlstudent>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: jhardin, kmcgrail, mysqlstudent, sidney
Priority: P2    
Version: 3.4.2   
Target Milestone: 3.4.2   
Hardware: PC   
OS: Linux   
Whiteboard:

Description Alex 2015-05-12 01:59:35 UTC
I pulled a few messages out of the quarantine this evening that hit NA_DOLLARS, US_DOLLARS_3 and MILLION_USD solely based on the fact that there was a legitimate discussion between a CFO and their accounting firm.

Is it really the case that all that's necessary for an email to be considered spam is a discussion of large sums of money without any other classifier?

I realize I could of course make the scores lower, but that's not good, guys. Any Nigerian scam or other effort to extort or steal money surely would involve some other characteristic, no?

Is there anything more that can be done here?

I bet TxRep might help, but should that be necessary?

__FRAUD_KDT ======> got hit: "USD $6,000,000"
MILLION_USD ======> got hit: "million U.S. dollars (USD"
__LOTSA_MONEY_01 ======> got hit: "$ 6,000,000"
__hk_bigmoney ======> got hit: "$ 6,000,000"
__FRAUD_DBI ======> got hit: "Euros"
NA_DOLLARS ======> got hit: "million U.S. dollar"
__LOTSA_MONEY_04 ======> got hit: "million Euros"
US_DOLLARS_3 ======> got hit: "$ 6,000,000"
__KAM_REFI4 ======> got hit: "$6,000"
__FRAUD_LTX ======> got hit: "million U.S. dollars"
__HUSH_HUSH ======> got hit: "confidentiality"

X-Spam-Report:
 * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
 *      [208.70.234.51 listed in wl.mailspike.net]
 *  0.0 RELAYCOUNTRY_US Relayed through United States
 * -0.0 SPF_PASS SPF: sender matches SPF record
 * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
 *      domain
 *  3.6 NA_DOLLARS BODY: Talks about a million North American dollars
 *  1.8 US_DOLLARS_3 BODY: Mentions millions of $ ($NN,NNN,NNN.NN)
 *  3.2 MILLION_USD BODY: Talks about millions of dollars
 *  0.0 HTML_MESSAGE BODY: HTML included in message
 *  0.1 LOC_CDIS_INLINE BODY: Content-Disposition: inline
 * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
 *      [score: 0.0000]
 *  0.0 LOTS_OF_MONEY Huge... sums of money
 * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
 *  0.1 LOC_IMGSPAM Probably inline image
 *  0.0 SAGREY Adds 0.01 to spam from first-time senders
Comment 1 Alex 2015-05-12 02:01:15 UTC
KAM's additional comment was to move the rules from stock to sandbox so they are evaluated for rule promotion and better scoring.
Comment 2 Kevin A. McGrail 2015-05-13 17:58:35 UTC
I've created 20_rules_to_sandbox.cf in my kmcgrail and removed the force_publish for these 3 rules MILLION_USD, NA_DOLLARS & US_DOLLARS  so they will go through ruleqa, etc. and be given better scores and S/O analyzed for promotion, etc.

If this works, we should look at moving all rules to the sandbox so everything goes through ruleqa.

svn commit -m 'Bug 7192 moving MILLION_USD, NA_DOLLARS & US_DOLLARS to sandbox for ruleqa/promotion, etc.'
Sending        rules/20_phrases.cf
Sending        rules/30_text_de.cf
Sending        rules/30_text_fr.cf
Sending        rules/30_text_nl.cf
Sending        rules/30_text_pl.cf
Sending        rules/30_text_pt_br.cf
Sending        rules/50_scores.cf
Sending        rulesrc/10_force_active.cf
Adding         rulesrc/sandbox/kmcgrail/20_rules_to_sandbox.cf
Transmitting file data .........
Committed revision 1679253.

regards,
KAM
Comment 3 Alex 2015-05-14 14:02:33 UTC
No updates at all in the last 24hrs. Just to clarify, should I expect to see 20_rules_to_sandbox.cf with the next sa-update and the MILLION_USD, NA_DOLLARS & US_DOLLARS rules removed from normal distribution?
Comment 4 Kevin A. McGrail 2015-05-14 18:29:06 UTC
(In reply to Alex from comment #3)
> No updates at all in the last 24hrs. Just to clarify, should I expect to see
> 20_rules_to_sandbox.cf with the next sa-update and the MILLION_USD,
> NA_DOLLARS & US_DOLLARS rules removed from normal distribution?

You will not see 20_rules_to_sandbox.

The rules are now in the sandbox so they are relegated to ruleqa and promotion to live rules is determined based on their merit

Rules are published nightly *if* everything goes well.  Last night, it didn't have enough SPAM:

 HAM: 227525 (150000 required)
SPAM: 145299 (150000 required)
Insufficient spam corpus to generate scores; aborting.
Exit Status 9 is not zero for do-nightly-rescore-example

If you join the ruleqa@ list, you can see these reported.

Anyway, give it a few more days and let me know if you see any of the rules disappear or the scores change.

This is considered resolved but the rules update is pending that process and we can tweak things based on that when it occurs.

Regards,
KAM
Comment 5 Kevin A. McGrail 2015-05-15 14:24:14 UTC
None of the rules were considered for automatic promotion and all are removed from the latest rule update: http://ruleqa.spamassassin.org/?daterev=20150514-r1679324-n&rule=MILLION_USD+NA_DOLLARS+US_DOLLARS_3&srcpath=&g=Change

They might be worth pushing out with a lower score ceiling if someone has any input.
Comment 6 John Hardin 2015-05-15 15:14:07 UTC
(In reply to Kevin A. McGrail from comment #5)
> None of the rules were considered for automatic promotion and all are
> removed from the latest rule update:
> http://ruleqa.spamassassin.org/?daterev=20150514-r1679324-
> n&rule=MILLION_USD+NA_DOLLARS+US_DOLLARS_3&srcpath=&g=Change
> 
> They might be worth pushing out with a lower score ceiling if someone has
> any input.

They overlap with LOTSA_MONEY, which does go out.
Comment 7 Sidney Markowitz 2017-04-15 03:27:58 UTC
This was labeled as version 3.4.1 but was committed ad closed after the 3.4.1 release and was never ported to the 3.4 branch. I'm changing the target version to 3.4.2 and will port the changes to branch.
Comment 8 Sidney Markowitz 2017-04-15 04:41:58 UTC
Committed to 3.4 branch merge from trunk as Revision 1791448