Bug 7585

Summary: No rule for mails from the far future
Product: Spamassassin Reporter: apachebugs
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: normal CC: jhardin, rwmaillists
Priority: P2    
Version: 3.4.1   
Target Milestone: Undefined   
Hardware: PC   
OS: Linux   
Whiteboard:

Description apachebugs 2018-05-16 14:17:05 UTC
ACTUAL BEHAVIOR:

I have an email like this: (note the dates in Received: and Date:)

Return-Path: <***>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ##C##
X-Spam-Level: ***
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_50,RCVD_IN_PBL,
	RDNS_DYNAMIC autolearn=no autolearn_force=no version=3.4.1
X-Spam-Report: 
	*  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
	*      [score: 0.5014]
	*  1.5 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
	*      [** listed in zen.spamhaus.org]
	*  1.1 RDNS_DYNAMIC Delivered to internal network by host with
	*      dynamic-looking rDNS
Delivered-To: ***@##C##
Received: from ##B## (***)
	by ##C## (Postfix) with ESMTP id ***
	for <***; Wed, 16 May 2018 16:13:40 +0200 (CEST)
Received: from ##A## (***)
   by ##B## (Postfix) with ESMPTPS id ***
   for <***>; Mon, 16 May 2033 14:56:04 +0200
Date: Mon, 16 May 2033 14:50:01 +0200

(*** denotes stuff removed for privacy, ##A/B/C## are the three server names involved)

EXPECTED BEHAVIOR: 
give this some spam points

Although this trick is a practice used quite often by spammers in my experience (in order to force the mail being shown on the very top of the inbox), there is no Spamassassin rule that matches this. 

The wiki documents such a rule (https://wiki.apache.org/spamassassin/Rules/FH_DATE_PAST_20XX) but it seems this rule was removed in between (probably in the aftermath of bug 5852 from 2010-01-01)
Comment 1 RW 2018-05-16 19:24:21 UTC
There is a rule which has been turned-off 

#score DATE_IN_FUTURE_96_XX 2.614 3.028 2.851 3.087
score DATE_IN_FUTURE_96_XX 0

possibly because of the rule DATE_IN_FUTURE_96_Q which is limited to 4 months in the future.  IMO there should be a DATE_IN_FUTURE_Q_XX to complement this.
Comment 2 RW 2018-05-16 20:32:04 UTC
I see there is a  T_DATE_IN_FUTURE_Q_PLUS that didn't fire. The code works out the time difference based on the difference between the date and the received header with the closest time. In this case it looks to be taking the date from the untrusted received header claiming 2033.

IMO if a received header claims to be later than the first trusted relay it should be ignored.
Comment 3 John Hardin 2018-05-16 20:52:38 UTC
(In reply to RW from comment #2)
> IMO if a received header claims to be later than the first trusted relay it
> should be ignored.

Bear in mind timezones. Would "more than 12h later" (absent making TZ adjustments before comparing) reduce the effectiveness of such a check? The sample at hand is *years* later, is that common?

It looks like the (un)trusted and internal/external relays pseudoheaders do not include datetime info. Would it be a good/reasonable/useful idea to extract the relay datetime into those headers (using a consistent normalized-to-UTC format perhaps like "utctime=yyyy-mm-ddThh:mm:ss") in addition to what's already being extracted?

Or is this a case of SQUIRREL! ?
Comment 4 RW 2018-05-18 16:28:08 UTC
(In reply to John Hardin from comment #3)
> (In reply to RW from comment #2)
> > IMO if a received header claims to be later than the first trusted relay it
> > should be ignored.
> 
> Bear in mind timezones. Would "more than 12h later" (absent making TZ
> adjustments before comparing) reduce the effectiveness of such a check? 

It already works in epoch seconds.

> The sample at hand is *years* later, is that common?

This applies to any offset. The offset is calculated between the date header time and the Received header time that closest. The bottom received header is skipped if it has an offset of zero, but otherwise the spammer can just forge a Received header and skip these rules entirely.

> It looks like the (un)trusted and internal/external relays pseudoheaders do
> not include datetime info. 

I think just using the top received header would be fine. However it might be better to leave it as it is and have a separate rule to find cases where *both* the date and a received header are well ahead (maybe 24h) of the top received header.  This targets a specific spamming trick and not just a client with the wrong time set.
Comment 5 John Hardin 2018-05-18 21:22:38 UTC
(In reply to RW from comment #4)
> (In reply to John Hardin from comment #3)
> > Bear in mind timezones. Would "more than 12h later" (absent making TZ
> > adjustments before comparing) reduce the effectiveness of such a check? 
> 
> It already works in epoch seconds.

D'oh... too much SQL lately.