Bug 6429

Summary: [review] vger.kernel.org triggers KB_DATE_CONTAINS_TAB and TAB_IN_FROM
Product: Spamassassin Reporter: Tony Finch <dot>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: antispam, Darxus, gdt, mdorman, rlpowell, themistocles82
Priority: P2    
Version: 3.3.1   
Target Milestone: 3.4.0   
Hardware: PC   
OS: All   
Whiteboard:
Attachments: example message from the git list
mails from kvm@vger that got marked incorrectly
*correctly* marked mails from kvm@vger
suggested rules adjustment
suggested rules adjustment, take two

Description Tony Finch 2010-05-05 11:23:55 UTC
Created attachment 4757 [details]
example message from the git list

All messages arriving from the vger.kernel.org mailing lists have tab characters in the From: and Date: lines which cause 5 points to be added to their score. Not sure what the right solution is (other than whitelist_from_rcvd in my local.cf).
Comment 1 Peter Alfredsen 2010-05-06 00:36:28 UTC
Funny, I was about to report the same thing, but then discovered that no recent ham was hit by this rule. It looks like my newly-setup spamass-milter (w/postfix) eats all header-starting tabs in the copy of the mail that spamass-milter examines, though the final copy is fine. I'm unsure if this is a milter/postfix/spamass-milter bug.

Anyway, they're running zmailer and have RFC822TABS enabled. Hit them with a cluebat about <stretches arms> ye large and explain to them that modifying headers in transit will make them go blind and make hair grow on their palms.
Comment 2 Adam Katz 2011-03-22 16:07:10 UTC
Does ALL mail from that list trigger these rules, or just some?  Is the User-Agent header always "Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)" when it triggers?
Comment 3 Greg Troxel 2011-03-22 16:14:09 UTC
It seems the issue is that the mailinglist is putting tabs in various fields, not that the original senders are doing that.   Looking at the 217 messages in my IMAP folder that have TAB_IN_FROM, User-Agent distribution is quite varied:

  41 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
   9 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110305 Remi/fc14 Lightning/1.0b3pre Thunderbird/3.1.9
   8 User-Agent: Mutt/1.5.21 (2010-09-15)
   7 User-Agent: Mutt/1.5.20 (2009-06-14)
   7 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.15) Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9
   7 User-Agent: Alpine 1.00 (DEB 882 2007-12-20)
   6 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
   6 User-Agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9
   6 User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
   3 User-Agent: Opera Mail/11.01 (Linux)
   2 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b2 Thunderbird/3.1.7
   2 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9
   2 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9
   2 User-Agent: KMail/1.9.3
   2 User-Agent: G2/1.0
   1 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9pre) Gecko/20100806 Lightning/1.0b2pre Lanikai/3.1.2
   1 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100918 Icedove/3.1.4
   1 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.10
   1 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100528 Thunderbird/3.0.5
   1 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Lightning/1.0b2 Thunderbird/3.1
   1 User-Agent: Loom/3.14 (http://gmane.org/)
   1 User-Agent: KMail/1.13.5 (Linux/2.6.32-4-amd64; KDE/4.4.5; x86_64; ; )
   1 User-Agent: KMail/1.13.3 (Linux/2.6.32-bpo.3-amd64; KDE/4.4.3; x86_64; ; )
   1 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)
   1 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
   1 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
   1 User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.2 (berkeley-unix)
Comment 4 Adam Katz 2011-03-22 16:25:36 UTC
Please attach some varying examples so we might be able to isolate the list engine's telltale signs.
Comment 5 Robin Lee Powell 2011-09-26 20:16:35 UTC
Created attachment 4975 [details]
mails from kvm@vger that got marked incorrectly
Comment 6 Robin Lee Powell 2011-09-26 20:18:16 UTC
Created attachment 4976 [details]
*correctly* marked mails from kvm@vger
Comment 7 Robin Lee Powell 2011-09-26 20:19:08 UTC
I just got bit by this.  I've attached both working and broken emails.

-Robin
Comment 8 Mark Martinec 2011-10-04 15:07:55 UTC
Created attachment 4977 [details]
suggested rules adjustment
Comment 9 AXB 2011-10-04 15:21:06 UTC
(In reply to comment #8)
> Created attachment 4977 [details]
> suggested rules adjustment

+1 (if required)
Comment 10 Darxus 2011-10-28 17:40:47 UTC
(In reply to comment #8)
> Created attachment 4977 [details]
> suggested rules adjustment

Looks like this is done, and should be committed and closed.
Comment 11 Adam Katz 2011-10-28 18:13:14 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > Created attachment 4977 [details]
> > suggested rules adjustment
> 
> Looks like this is done, and should be committed and closed.

-1

I disagree.  This is an undesirable workaround as it fails to address the source of the issue.  Has anybody researched what mailing list software is doing this?  The exception should target a signature of that software rather than this particular instantiation of it.
Comment 12 Darxus 2011-10-28 18:19:32 UTC
(In reply to comment #11)
> I disagree.  This is an undesirable workaround as it fails to address the
> source of the issue.  Has anybody researched what mailing list software is
> doing this?  The exception should target a signature of that software rather
> than this particular instantiation of it.

It's running majordomo:  http://vger.kernel.org/vger-lists.html
technical-alerts@us-cert.gov also runs majordomo, and doesn't hit either of these TAB rules.  Seems likely to be a problem specific to the configuration at vger.kernel.org.  I just sent a subscription request to lkml.
Comment 13 Darxus 2011-10-28 18:31:11 UTC
Subscription request is being slow, and I realized I'm on git@vger.kernel.org.  Confirmed it's hitting KB_DATE_CONTAINS_TAB and TAB_IN_FROM.  

So, the problem is specific to vger.kernel.org, as Mark's rule specifies, and it should be committed and this ticket should be closed?
Comment 14 Michael Alan Dorman 2011-10-28 18:45:14 UTC
The problem isn't majordomo, or even vger.kernel.org, per se.

The issue is zmailer.  Specifically their configuration of zmailer.  Check out http://www.zmailer.org/man/zmailer.conf.5zm.html and search for
RFC822TABS.

The only thing I have been able to find that seems characteristic of zmailer, as opposed to vger.kernel.org, is that it seems to have very unusual components in its received lines:

Received: from mx1.redhat.com ([209.132.183.28]:13864 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754077Ab1J1SeM (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 28 Oct 2011 14:34:12 -0400

The 'ORCPT <rfc822;' bit seems to be characteristic of zmailer.
Comment 15 Darxus 2011-10-28 18:47:21 UTC
Does anybody other than vger.kernel.org use zmailer (with RFC822TABS)?
Comment 16 Darxus 2011-10-28 18:49:26 UTC
(In reply to comment #14)
> The 'ORCPT <rfc822;' bit seems to be characteristic of zmailer.

That match 100% of the last 1902 emails I'v gotten from the git list, and 0 of my last 1050 non-list hams.
Comment 17 Mark Martinec 2011-10-28 19:05:34 UTC
Created attachment 4991 [details]
suggested rules adjustment, take two

Ok, how about this one then.
Comment 18 Michael Alan Dorman 2011-10-28 19:10:22 UTC
> Ok, how about this one then.

Seems reasonable to me, though I would still hope that if we could show them that they are the only large purveyor of ham with those tabs, they would make this small config change.  Does anyone have any numbers?

Mike.
Comment 19 Darxus 2011-10-28 19:20:28 UTC
(In reply to comment #18)
> Seems reasonable to me, though I would still hope that if we could show them
> that they are the only large purveyor of ham with those tabs, they would make
> this small config change.  Does anyone have any numbers?

Of the 12201 emails in my ham corpus, all of them that hit either of these two rules are from vger.kernel.org.  From ruleqa:

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0   0.6930   0.0143   0.980    0.77    3.80  KB_DATE_CONTAINS_TAB  
      0   0.6924   0.0143   0.980    0.77    0.26  TAB_IN_FROM  

The fact that the ham hit rates for those two rules are exactly the same could be additional evidence of 

Hit rate is 13 of 91022.  That's fewer than the (22) hams that hit that I currently have in my corpus, so it could be that all hams in the corpora that hit these rules are from my corpus and from vger.
Comment 20 Adam Katz 2011-10-28 20:19:27 UTC
Mike wrote in comment #14
> The issue is zmailer.  Specifically their configuration of zmailer.  Check out
> http://www.zmailer.org/man/zmailer.conf.5zm.html and search for
> RFC822TABS.
...
> The 'ORCPT <rfc822;' bit seems to be characteristic of zmailer.

Mark wrote in comment #17
> Created attachment 4991 [details]
> suggested rules adjustment, take two
> 
> Ok, how about this one then.

Looks like we have our silver bullet.  Thanks Mike and Mark!

Mark, I'll let you check that in yourself.
Comment 21 Mark Martinec 2011-10-28 20:28:59 UTC
trunk:
  Bug 6429: vger.kernel.org triggers KB_DATE_CONTAINS_TAB and TAB_IN_FROM'
  Sending rules/20_head_tests.cf
  Sending rulesrc/sandbox/hege/20_hk.cf
  Sending rulesrc/sandbox/kb/20_header.cf
Committed revision 1190550.