SA Bugzilla – Bug 7933
Catch really old mails
Last modified: 2021-10-06 08:53:04 UTC
Maybe old dates like: Date: Mon, 06 Jul 2020 11:09:58 -0700 (PDT) should trigger something. "Hopdelta" says: Sender Recipient Time Delta Start gmail.com 02:09:58 2020/07/07 [127.0.1.1] smtp.gmail.com 02:09:58 2020/07/07 0s PDT mail-wm1-x331.google.com 02:09:58 2020/07/07 0s mail-wm1-x331.google.com shenron.openstreetmap.org 02:10:00 2020/07/07 2s shenron.openstreetmap.org 100.96.133.195 22:53:01 2021/10/05 1s 43m 20h 455d postfix-inbound-0.inbound.mailchannels.net pdx1-sub0-mail-mx22.g.dreamhost.com 22:53:02 2021/10/05 1s Maybe even a 0.1 score would be good. No I don't know what is old enough: one week, one month, one year? Maybe separate rules for each. Also some folks would in fact like to give it a negative score. Well if there was a rule for it then they could. Else they would need to make a fancy parser...
(In reply to jidanni from comment #0) > Maybe old dates like: > Date: Mon, 06 Jul 2020 11:09:58 -0700 (PDT) > should trigger something. Like the DATE_IN_PAST_* rules? Can you provide an example of a message that doesn't hit any of those which you think should be hit by a new rule?
Created attachment 5755 [details] Old mail not detected Why doesn't this trigger header DATE_IN_PAST_96_XX eval:check_for_shifted_date('undef', '-96') describe DATE_IN_PAST_96_XX Date: is 96 hours or more before Received: date
(In reply to jidanni from comment #2) > Created attachment 5755 [details] > Old mail not detected > > Why doesn't this trigger > > header DATE_IN_PAST_96_XX eval:check_for_shifted_date('undef', '-96') > describe DATE_IN_PAST_96_XX Date: is 96 hours or more before Received: date Good question... If I'm reading the code correctly, the reason for this is that there are plausible and parseable Received headers which have times close to the Date header. If I strip out the Received headers from 2020, it triggers that rule. The comments in the code imply that not using the smallest Date/Received difference resulted in false positives. Since DATE_IN_PAST_96_XX and its siblings are fairly strong rules with scores set by the RuleQA process (current scores for DATE_IN_PAST_96_XX: 2.600 2.070 1.233 3.405) I do not believe it would be polite to users to modify the behavior of the underlying eval function at this point. It currently is a measurement of the apparent delay between message composition and initial submission, not of total transit time. RuleQA shows that metric correlating rather well with spamminess. It may be useful to add a different test that looks at a more strictly specified date comparison, such as using the last Received header or the last "trusted" Received header instead of the current practice of using the smallest time delta in a parseable Received header relative to the Date header. That would require a new eval in Plugin/HeaderEval.pm. Whether a measurement of putative total transit time actually correlates either way to ham or spam is anyone's guess. In the sample case, it seems likely to me that the message is not spam, but rather some sort of re-injected mail originally sent to a discussion list.
Well fine, perhaps change > describe DATE_IN_PAST_96_XX Date: is 96 hours or more before Received: date < describe DATE_IN_PAST_96_XX Date: is 96 hours or more before EARLIEST Received: date Anyway maybe there should be a < describe DATE_IN_PAST_96_XX2 Date: is 96 hours or more before LATEST Received: date to really catch them all, even if they aren't spam. Perhaps score 0.1 for now.
I would agree that having one or more checks against he latest received date would be handy. I've also seen a few cases were even the latest received date is bogus (I'm not an ISP), so an ability to check against the SA system date would be nice. I trust my own system date.