Bug 6678 - FAKE_REPLY_C triggered by MSOE6 replies without References
Summary: FAKE_REPLY_C triggered by MSOE6 replies without References
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3.2
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-19 20:07 UTC by Ned Slider
Modified: 2011-10-20 08:22 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Ned Slider 2011-10-19 20:07:36 UTC
I have a few cases of FAKE_REPLY_C being triggered in ham by MSOE6 replies.

72_active.cf:meta     FAKE_REPLY_C              (__SUBJ_RE && __MISSING_REF && __NO_INR_YES_REF)

My examples all appear to hit __SUBJ_RE and References is UNSET so also hit __MISSING_REF. The final part of the meta:

72_active.cf:meta     __NO_INR_YES_REF  (__XM_GNUS || __XM_MSOE5 || __XM_MSOE6 || __XM_MOZ4 || __XM_SKYRI || __XM_WWWMAIL || __UA_GNUS || __UA_KNODE || __UA_MUTT || __UA_PAN || __UA_XNEWS)

seems to match against __XM_MSOE6:

X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-Mailer: Microsoft Outlook Express 6.00.2900.3664

I have around a dozen examples matching the above profile over the last year. I don't have a huge pool of replies sent from MSOE.

The "obvious" solution appears to be to drop __XM_MSOE6 from the meta rule __NO_INR_YES_REF but I'm not really sure what __NO_INR_YES_REF is designed to achieve.

What else would you like from me?
Comment 1 Karsten Bräckelmann 2011-10-19 20:38:17 UTC
> The "obvious" solution appears to be to drop __XM_MSOE6 from the meta rule
> __NO_INR_YES_REF but I'm not really sure what __NO_INR_YES_REF is designed to
> achieve.

Going by the comment in the sandbox cf file and its use in FAKE_REPLY_C, I'd say it's a meta for MUAs setting a References, but no In-Reply-To header, when replying.

Are these *real* replies, or did the sender perhaps himself add some "Re:" style string to the Subject of an otherwise new mail?

Any chance an overly aggressive and paranoid SMTP relay messed with the mail, censoring and stripping the References header?
Comment 2 Ned Slider 2011-10-19 21:46:05 UTC
These certainly look like *real* replies, as in it's a conversation where one user has replied to another. Subjects all begin with Re:

They all have no In-Reply-To header and no References header, and all appear to be from MSOE6 MUA.

Yet I have other examples of replies from the same user with the same X-Mailer relayed through the same ISP that do have the References header.

BTW - the FP hits are not just with a single user - I have examples from a couple different users relaying through different ISPs.


Here's redacted headers from one example:

From - Fri Jan 14 10:35:29 2011
X-Account-Key: account13
X-UIDL: 00027563483efc4b
X-Mozilla-Status: 0011
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Return-Path: <redacted@redacted>
X-Original-To: redacted@pendre.co.uk
Delivered-To: redacted@pendre.co.uk
Received: from localhost (Quad [127.0.0.1])
	by quad.pendre.co.uk (Postfix) with ESMTP id 7EDC22F42D5
	for <redacted@pendre.co.uk>; Fri, 14 Jan 2011 09:41:45 +0000 (GMT)
X-Virus-Scanned: amavisd-new at pendre.co.uk
X-Spam-Flag: NO
X-Spam-Score: -3.523
X-Spam-Level: 
X-Spam-Status: No, score=-3.523 tagged_above=-999 required=5
	tests=[BAYES_00=-5, FAKE_REPLY_C=1.486, HTML_MESSAGE=0.001,
	RCVD_IN_DNSWL_NONE=-0.0001, T_RP_MATCHES_RCVD=-0.01]
	autolearn=disabled
Received: from quad.pendre.co.uk ([127.0.0.1])
	by localhost (quad.pendre.co.uk [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id qkhobyogXUNG; Fri, 14 Jan 2011 09:41:43 +0000 (GMT)
Received: from mail.btconnect.com (c2bthomr13.btconnect.com [213.123.20.131])
	by quad.pendre.co.uk (Postfix) with ESMTP id CECBC2F42D4
	for <redacted@redacted>; Fri, 14 Jan 2011 09:41:42 +0000 (GMT)
Received: from host81-158-xxx-xxx.range81-158.btcentralplus.com (EHLO Chris) ([81.158.xxx.xxx])
	by c2bthomr13.btconnect.com
	with ESMTP id BHY37705;
	Fri, 14 Jan 2011 09:40:29 +0000 (GMT)
Received: from 127.0.0.1 (AVG SMTP 9.0.872 [271.1.1/3378]); Fri, 14 Jan 2011 09:37:41 +0000
Message-ID: <redacted@Chris>
From: "Chris" <redacted@redacted>
To: "redacted" <redacted@redacted>
Subject: Re: redacted
Date: Fri, 14 Jan 2011 09:37:41 -0000
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0046_01CBB3CE.B0481520"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994
X-Mirapoint-IP-Reputation: reputation=Neutral-1,
	source=Queried,
	refid=tid=0001.0A0B0302.4D301A0C.0172,
	actions=tag
X-Junkmail-Status: score=10/50, host=c2bthomr13.btconnect.com
X-Junkmail-Signature-Raw: score=unknown,
	refid=str=0001.0A0B0208.4D301A55.0274,ss=1,fgs=0,
	ip=0.0.0.0,
	so=2010-07-22 22:03:31,
	dmn=2009-09-10 00:05:08,
	mode=single engine
X-Junkmail-IWF: false

This is a multi-part message in MIME format.
Comment 3 Karsten Bräckelmann 2011-10-19 22:12:12 UTC
(In reply to comment #2)
> These certainly look like *real* replies, as in it's a conversation where one
> user has replied to another. Subjects all begin with Re:

> Yet I have other examples of replies from the same user with the same X-Mailer
> relayed through the same ISP that do have the References header.

Can you tell if this was the first "reply", or already a replied-to mail?

Asking, because we most likely can rule out the SMTP relay stripping the header, according to comment 2. The difference would be the user manually adding the Re: prefix, or just copying the Subject verbatim.

Either case, it looks like the sender did not actually reply, but composed the message some other way. Like the reverse of the thread-hijacking "reply, and prune subject and body to get a 'clean' message". Users have done stranger things, and I wouldn't be surprised if some of your samples turn out to be copied content from a reply into a newly composed message...

Or maybe some "helpful" third-party tool used, that hooks into OE6.


Anyway, it would be an edge case, and single rules FP'ing does happen... From your comments it certainly doesn't seem to be a bad rule systematically triggering falsely.

> X-Spam-Status: No, score=-3.523 tagged_above=-999 required=5
>     tests=[BAYES_00=-5, FAKE_REPLY_C=1.486, HTML_MESSAGE=0.001,
>     RCVD_IN_DNSWL_NONE=-0.0001, T_RP_MATCHES_RCVD=-0.01]
Comment 4 D. Stussy 2011-10-20 00:44:33 UTC
MSOE release 6.00.2800.1478 (stock - no plugins) does generate these headers on replies.  I agree that if they are not being generated in your release and something [third-party] was installed, then it's not SA's problem.
Comment 5 Karsten Bräckelmann 2011-10-20 02:19:38 UTC
(In reply to comment #4)
> MSOE release 6.00.2800.1478 (stock - no plugins) does generate these headers on
> replies.  I agree that if they are not being generated in your release and
> something [third-party] was installed, then it's not SA's problem.

Heh. Thanks for the confirmation, but I guess we all agree on the part OE6 does generate these headers. It's a rather old rule, being used for years...

Even though indirectly, I wasn't implying outright "not a SA problem". Though you got what I meant. :)  If this is merely a rare edge case, it's not worth addressing, since the score is low-ish, and no way it can make the mail FP on its own. Mass-check and score-generation should help here, too.

This is in no way meant to devalue the report, and documenting it, which is much appreciated. It might, however, mean there's not much we could do, and probably just ignore this occasional single-rule misfiring. That's normal for scoring systems, and nothing to worry about.

Basically, regardless of bug or not, my aim was to identify the cause for the missing header.
Comment 6 Ned Slider 2011-10-20 05:41:11 UTC
@Karsten,

Agreed, this does seem to be some weird edge case rather than a rule routinely misfiring due to a missing header and as such I'm less concerned. Of course my primary reason for filing the bug report is to try to improve the quality of a rule that occasionally misfires but without having a clearer understanding of exactly why it's misfiring that's not so easy.

On a small server with a handful of domains, I see only a dozen "misfires" in the last year. The rule hits 37 spam in a corpus of 6500 and heavily overlaps with FORGED_MUA_OUTLOOK.

If I am able to track down any more information as to why the rule is misfiring I'll be sure to document it here.

Thanks.
Comment 7 D. Stussy 2011-10-20 08:22:34 UTC
Note/aside:  Sometimes, I use MSOE to post to Usenet and set a reply-to to a special mailbox which I have programmed my MTA to accept only reply messages.  It determines whether a message is a reply by actually scanning the References and In-Reply-To headers for a message ID issued by my host or NNTP server.  When it fails to find one, it SMTP rejects the message.  Therefore, I am quite certain that MSOE generates these headers properly (since that's what I used to generate test messages for my MTA rulesets).  I do not require that the subject start with "RE:" because a reply could change the topic and thus follow a format of "<new_subject> - was RE: <old_subject>."

I have seen spammers try to send to my reserved mailbox after harvesting the address from Usenet - and in every case, their message was rejected for not having either of the ref/IRT headers.  I do look carefully at my logs when this happens and I have yet to see a false positive spam.  So far, I have not had to examine the local-part of the ref/IRT message IDs to verify that it was a message I actually sent when spam was detected.  (That doesn't mean that I don't examine the local-parts; all it means is that when spam was detected, the domain-part didn't match, was absent, or the headers were missing.  I have yet to see a spam that has a matching domain-part -- which could happen.)

Therefore, I suggest that starting a subject with "Re:" is some spammer's attempt to bypass simple filters which may skip certain spam checks on the grounds that it's a reply (especially for a C/R based system which expects a reply in band).  "Re:" is merely a convention not present in any RFC, but the Ref/IRT headers have been in the RFCs (5322 -> 2822 -> 822 ->733 ->724 [12 May 1977], Sections II.C.2.b and II.C.2.c) for 34 years.  "By definition," a reply will have at least one if not both of these headers, even if it lacks "Re:" in the subject.  Furthermore, any "true" reply which lacks both of these headers probably is a fake or from a noncompliant mail user agent; either way, I don't see the triggering of this rule as false.