Bug 2875 - Lot of mPOP generated spam not matched: new rules included
Summary: Lot of mPOP generated spam not matched: new rules included
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 2.61
Hardware: PC Linux
: P5 normal
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 2879 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-12-30 07:02 UTC by Jeroen van Wolffelaar
Modified: 2004-03-03 12:39 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen van Wolffelaar 2003-12-30 07:02:17 UTC
These two rules work very well for me, currently no other spamassassin rules
give enough points for this kind of spam.

Maybe the first two tests are due to the used mailing software, which I did not
positively identify as ratware.

The third and last test is really only seen on spam by me.

full J_SPAM_HTML        /<HTML><HEAD>\n<BODY>\n<p>/s
describe J_SPAM_HTML    Typical ratware usage of HTML headers
score J_SPAM_HTML       2

full J_SPAM_HTML_2      /<HTML><HEAD>\n\n<\/HEAD>\n<BODY>\n\n  <p>/s
describe J_SPAM_HTML_2  Typical ratware usage of HTML headers (2)
score J_SPAM_HTML_2     2
                                                                                
header J_SUBJ_ID        Subject =~ /^Re: [A-Z]{3,10}, [a-z' ]{10,50}$/
describe J_SUBJ_ID      Subj: "Re: CAPITALS, lowercase normal text"
score J_SUBJ_ID         2

Received: from unknown (HELO c-24-0-9-243.client.comcast.net) (24.0.9.243)
  by pb2.pair.com with SMTP; 29 Dec 2003 10:04:03 -0000
Received: from [24.0.9.243] by 530000x.netIP with HTTP;
    Mon, 29 Dec 2003 01:59:57 +0400
From: "Brewster Vince" <rhbqmneojg@el-nacional.com >
To: jeroen@php.net
Subject: Re: FYKMM, fence far from
Mime-Version: 1.0
X-Mailer: mPOP Web-Mail 2.19
X-Originating-IP: [530000x.netIP]
Date: Sun, 28 Dec 2003 23:01:57 +0100
Reply-To: "Brewster Vince" <rhbqmneojg@el-nacional.com >
Content-Type: multipart/alternative;
    boundary="--ALT--WJFS68409810618156"
Message-Id: <EOYNFZX-0005031522027@minerva>

----ALT--WJFS68409810618156
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
 
menopause tid choice magma abrasive postmortem
chairman downpour crabmeat gelatine
somersault polymer marjorie shun discus armistice cantaloupe authenticate
 
----ALT--WJFS68409810618156
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 8bit
 
<HTML><HEAD>
 
</HEAD>
<BODY>
 
  <p>O</comprehend>ur U</smith>S Li</electoral>censed Doc</dredge>tors
wi</stretch>ll<BR>
Comment 1 Kjetil Kjernsmo 2003-12-30 09:29:44 UTC
I see a lot of these spams too. They are caught by my Bayesian, though. Clearly
the random words are intended to confuse bayes, but if you've trained it well
with your own ham and spam, you don't need to worry. But since these don't score
 sufficiently to be rejected at SMTP-time, they are annoying. 

There should be more to match here: The weird X-Originating-IP, the bogus SGML
end tags.  

One thing that's probably harder to match is the single paragraph in plain text,
followed by a non-sensical HTML version. 

I think this deserves some more attention, but your summary wasn't very
descriptive, perhaps change it?
Comment 2 Jeroen van Wolffelaar 2003-12-30 10:28:16 UTC
> There should be more to match here: The weird X-Originating-IP, the
> bogus SGML end tags.

I think it's useful to have a copy of the used software... Googling for
it only reveals links to spamware, and                                         
                          http://www.dataart.com/products/mpop/

> One thing that's probably harder to match is the single paragraph in
> plain text, followed by a non-sensical HTML version.

And the usage of a lot of invalid (closing) HTML tags inside a sentence, like
in http://article.gmane.org/gmane.mail.spam.spamassassin.general/37120:

<p>Fr</cog>ee Ca</coronary>bleTV!N</fold>o mo</theft>re
p</beaten>ay!!</p>

> I think this deserves some more attention, but your summary wasn't very
> descriptive, perhaps change it?
                                                                                
Fixed
Comment 3 Malte S. Stretz 2003-12-30 16:23:44 UTC
*** Bug 2879 has been marked as a duplicate of this bug. ***
Comment 4 Malte S. Stretz 2003-12-30 16:24:46 UTC
In bug 2879, Sidney said: 
| mPOP Web-Mail does have some legitimate users, as you can see at 
|  
| http://www.gnu-pascal.de/crystal/gpc/en/raw-mail7003.html 
|  
| It does look like a lot of the recent spate of random word spam is using a 
| provider who is running mPOP Web-Mail, but it is not a perfect spam signal. 
Comment 5 Lee Howard 2003-12-30 16:36:10 UTC
These domains:

 2004hosting.net
 e-hostzz.net
 530000x.com

are all registered to the same entity, and the whois information is 
meaningless.  Seems to have been established early this month soely for the 
purpose of hiding the identity of the sender.  The spam may be exploiting an 
open relay hole in mPOP (given the randomness of the IPs the spam comes from) 
and it may use mPOP for the way that it allows the origination IP to be hidden 
by the DNS entry.
Comment 6 Kenneth Porter 2003-12-30 19:33:38 UTC
See the messages on SA-Talk with "Ruleset for RND UC CHAR spam" in the subject.

Another feature is that the text/plain body has 3 lines of real lowercase words,
no punctuation except for apostrophes in contractions. Sometimes a blank line
with one space precedes the 3 lines of text.
Comment 7 Kenneth Porter 2003-12-30 19:36:18 UTC
Additional domains seen in the body:

 2004hosting.org
 530000x.net
Comment 8 Sidney Markowitz 2003-12-30 23:31:22 UTC
I wonder if the spammer behind these is subscribed to this mailing list... I
just got a new one with the typical three lines of random words, but this time
it said

X-Mailer: procure

My guess is that they are sticking a random word in the X-Mailer header now.

It does still have one characteristic that still shows the web based mailer
being used:

Received: from [0.64.184.216] by 24.118.250.42 with HTTP;
        Wed, 31 Dec 2003 05:32:23 +0100

Let's see if they get rid of that now that I've mentioned it.
Comment 9 Jeroen van Wolffelaar 2003-12-31 03:59:43 UTC
From my personal spam archive.

Note: 
- In headers, it says: "Full Name <email@address >" (with a space before the last >)
- Use of %RND_UC_CHAR[2-8], I suggest changing the '{3,10}' to a '+' in the
J_SUBJ_ID rule
- The X-Originating IP is unroutable and nonexistent (according to whois)

Since mPOP might have normal use, you cannot match for that, but you can match
for that strange space, for J_SUBJ_ID, and:

full J_OBFUSCATING_SGML /([^<>]*[^<> ]<\/[a-z]+>[^<> ]){3}/
describe J_OBFUSCATING_SGML Contains 3 successive mid-text SGML closing tags
score J_OBFUSCATING_SGML 2

Delivered-To: jeroen@php.net
Received: (qmail 1911 invoked from network); 20 Dec 2003 13:39:32 -0000
Received: from unknown (HELO 216.92.131.5) (61.177.183.229)
  by pb2.pair.com with SMTP; 20 Dec 2003 13:39:32 -0000
Received: from [61.177.183.229] by 56.218.116.136 with HTTP;
        Fri, 19 Dec 2003 21:31:18 -0400
From: "Fair Courtney" <gurxvuqplowy@terra.com >
To: jens@php.net, jeroen@php.net
Subject: Re: %RND_UC_CHAR[2-8], was he drinking
Mime-Version: 1.0
X-Mailer: mPOP Web-Mail 2.19
X-Originating-IP: [196.56.86.173]
Date: Sat, 20 Dec 2003 00:36:18 -0100
Reply-To: "Fair Courtney" <gurxvuqplowy@terra.com >
Content-Type: multipart/alternative;
        boundary="--ALT--TKEQ09634790224014"
Message-Id: <WOKZZSI-0005997098858@emeritus>
Comment 10 Lee Howard 2003-12-31 10:42:24 UTC
Jeroen, the J_OBFUSCATING_SGML does not appear on any of my mPOP spam and was 
probably a limited characteristic.

Sidney, I see lots of spam coming in with bogus X-Mailer identification headers 
such as "antonia", "alma", "iconoclasm ironic", etc.  I don't think that these 
are the same mPOP spam that this bug report is focused on, but yes it is 
another thing that could be addressed.

For the moment I am doing this...

full LINK_2004HOSTING_NET /href=.*2004hosting/s
describe LINK_2004HOSTING_NET Abusive domain - 2004hosting.net
score LINK_2004HOSTING_NET 4.0

for each abusive spam domain.  It works, but it's tedious labor to do this with 
every domain that sends spam that is not catchable by other rules.  I'm sure 
there's a better way to do it than this.  Could someone enlighten me?
Comment 11 Justin Mason 2004-01-05 14:52:07 UTC
Subject: Re:  Lot of mPOP generated spam not matched: new rules included 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>I wonder if the spammer behind these is subscribed to this mailing list... I
>just got a new one with the typical three lines of random words, but this time
>it said
>X-Mailer: procure
>My guess is that they are sticking a random word in the X-Mailer header now.

There are almost definitely multiple versions of this spamware about.
There's also differences in the HTML formatting.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Exmh CVS

iD8DBQE/+epmQTcbUG5Y7woRAvGkAKCI0utrxEKwr4bl3CECqmosxvXRuQCg0Mo6
qieBXY0Ak5UIbd6x/Su6n60=
=4ABp
-----END PGP SIGNATURE-----

Comment 12 Gregor Lawatscheck 2004-01-19 05:17:51 UTC

Courtesy of http://www.mail-archive.com/spamassassin-talk@lists.sourceforge.
net/msg28113.html 

This is from SA-Talk but would be nice to see it in CVS!

Seems to work rather well for all this Re: [2-8] uppercase stuff - The 
orgininating IP is in fact a domain with IP tacked at the end of it - I too 
verified this to be true!:

header SUBJ_RND_UC_CHAR_L       Subject =~ /\%RND_UC_CHAR/
describe SUBJ_RND_UC_CHAR_L     Subject contains literal RND_UC_CHAR tag
score SUBJ_RND_UC_CHAR_L        5.0

header SUBJ_RND_UC_CHAR         Subject =~ /^Re:\s[A-Z]{2,8},\s[a-z]+\s[a-z]
+\s[a-z]+\s*$/
describe SUBJ_RND_UC_CHAR       Subject fits RND_UC_CHAR pattern
score SUBJ_RND_UC_CHAR          3.0

header XOIP_RND_UC_CHAR         X-Originating-IP =~ /\[.*\.(com|net|org|biz).
*IP\]/
describe XOIP_RND_UC_CHAR       X-Originating-IP fits RND_UC_CHAR pattern
score XOIP_RND_UC_CHAR          2.0

Comment 13 Fred T 2004-01-19 06:04:51 UTC
The first rule posted takes into account '.
The current rule does not handle this exception.
Example subjects:

Re: BLQNZO, i'll cut your
Re: XG, indeed? and what

I am fairly sure I've see Quotes between 2 words like:
Re: DGFJ, "then she" away
Comment 14 Brent J. Nordquist 2004-01-19 06:52:06 UTC
If you want the current version of the RND_UC_CHAR rules in comment #12, they
are at http://kepler.acns.bethel.edu/~bjn/spamassassin/rnd_uc_char.cf

The latest version takes punctuation into account (see comment #13), and also
finds other recent samples where there are more than three random words.
Comment 15 Justin Mason 2004-03-03 21:39:13 UTC
FWIW, we seem to have a good set in current CVS -- I've been using them and
haven't had an mPOP spam get past in a while.  so, closing