SA Bugzilla – Bug 2875
Lot of mPOP generated spam not matched: new rules included
Last modified: 2004-03-03 12:39:13 UTC
These two rules work very well for me, currently no other spamassassin rules give enough points for this kind of spam. Maybe the first two tests are due to the used mailing software, which I did not positively identify as ratware. The third and last test is really only seen on spam by me. full J_SPAM_HTML /<HTML><HEAD>\n<BODY>\n<p>/s describe J_SPAM_HTML Typical ratware usage of HTML headers score J_SPAM_HTML 2 full J_SPAM_HTML_2 /<HTML><HEAD>\n\n<\/HEAD>\n<BODY>\n\n <p>/s describe J_SPAM_HTML_2 Typical ratware usage of HTML headers (2) score J_SPAM_HTML_2 2 header J_SUBJ_ID Subject =~ /^Re: [A-Z]{3,10}, [a-z' ]{10,50}$/ describe J_SUBJ_ID Subj: "Re: CAPITALS, lowercase normal text" score J_SUBJ_ID 2 Received: from unknown (HELO c-24-0-9-243.client.comcast.net) (24.0.9.243) by pb2.pair.com with SMTP; 29 Dec 2003 10:04:03 -0000 Received: from [24.0.9.243] by 530000x.netIP with HTTP; Mon, 29 Dec 2003 01:59:57 +0400 From: "Brewster Vince" <rhbqmneojg@el-nacional.com > To: jeroen@php.net Subject: Re: FYKMM, fence far from Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [530000x.netIP] Date: Sun, 28 Dec 2003 23:01:57 +0100 Reply-To: "Brewster Vince" <rhbqmneojg@el-nacional.com > Content-Type: multipart/alternative; boundary="--ALT--WJFS68409810618156" Message-Id: <EOYNFZX-0005031522027@minerva> ----ALT--WJFS68409810618156 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit menopause tid choice magma abrasive postmortem chairman downpour crabmeat gelatine somersault polymer marjorie shun discus armistice cantaloupe authenticate ----ALT--WJFS68409810618156 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 8bit <HTML><HEAD> </HEAD> <BODY> <p>O</comprehend>ur U</smith>S Li</electoral>censed Doc</dredge>tors wi</stretch>ll<BR>
I see a lot of these spams too. They are caught by my Bayesian, though. Clearly the random words are intended to confuse bayes, but if you've trained it well with your own ham and spam, you don't need to worry. But since these don't score sufficiently to be rejected at SMTP-time, they are annoying. There should be more to match here: The weird X-Originating-IP, the bogus SGML end tags. One thing that's probably harder to match is the single paragraph in plain text, followed by a non-sensical HTML version. I think this deserves some more attention, but your summary wasn't very descriptive, perhaps change it?
> There should be more to match here: The weird X-Originating-IP, the > bogus SGML end tags. I think it's useful to have a copy of the used software... Googling for it only reveals links to spamware, and http://www.dataart.com/products/mpop/ > One thing that's probably harder to match is the single paragraph in > plain text, followed by a non-sensical HTML version. And the usage of a lot of invalid (closing) HTML tags inside a sentence, like in http://article.gmane.org/gmane.mail.spam.spamassassin.general/37120: <p>Fr</cog>ee Ca</coronary>bleTV!N</fold>o mo</theft>re p</beaten>ay!!</p> > I think this deserves some more attention, but your summary wasn't very > descriptive, perhaps change it? Fixed
*** Bug 2879 has been marked as a duplicate of this bug. ***
In bug 2879, Sidney said: | mPOP Web-Mail does have some legitimate users, as you can see at | | http://www.gnu-pascal.de/crystal/gpc/en/raw-mail7003.html | | It does look like a lot of the recent spate of random word spam is using a | provider who is running mPOP Web-Mail, but it is not a perfect spam signal.
These domains: 2004hosting.net e-hostzz.net 530000x.com are all registered to the same entity, and the whois information is meaningless. Seems to have been established early this month soely for the purpose of hiding the identity of the sender. The spam may be exploiting an open relay hole in mPOP (given the randomness of the IPs the spam comes from) and it may use mPOP for the way that it allows the origination IP to be hidden by the DNS entry.
See the messages on SA-Talk with "Ruleset for RND UC CHAR spam" in the subject. Another feature is that the text/plain body has 3 lines of real lowercase words, no punctuation except for apostrophes in contractions. Sometimes a blank line with one space precedes the 3 lines of text.
Additional domains seen in the body: 2004hosting.org 530000x.net
I wonder if the spammer behind these is subscribed to this mailing list... I just got a new one with the typical three lines of random words, but this time it said X-Mailer: procure My guess is that they are sticking a random word in the X-Mailer header now. It does still have one characteristic that still shows the web based mailer being used: Received: from [0.64.184.216] by 24.118.250.42 with HTTP; Wed, 31 Dec 2003 05:32:23 +0100 Let's see if they get rid of that now that I've mentioned it.
From my personal spam archive. Note: - In headers, it says: "Full Name <email@address >" (with a space before the last >) - Use of %RND_UC_CHAR[2-8], I suggest changing the '{3,10}' to a '+' in the J_SUBJ_ID rule - The X-Originating IP is unroutable and nonexistent (according to whois) Since mPOP might have normal use, you cannot match for that, but you can match for that strange space, for J_SUBJ_ID, and: full J_OBFUSCATING_SGML /([^<>]*[^<> ]<\/[a-z]+>[^<> ]){3}/ describe J_OBFUSCATING_SGML Contains 3 successive mid-text SGML closing tags score J_OBFUSCATING_SGML 2 Delivered-To: jeroen@php.net Received: (qmail 1911 invoked from network); 20 Dec 2003 13:39:32 -0000 Received: from unknown (HELO 216.92.131.5) (61.177.183.229) by pb2.pair.com with SMTP; 20 Dec 2003 13:39:32 -0000 Received: from [61.177.183.229] by 56.218.116.136 with HTTP; Fri, 19 Dec 2003 21:31:18 -0400 From: "Fair Courtney" <gurxvuqplowy@terra.com > To: jens@php.net, jeroen@php.net Subject: Re: %RND_UC_CHAR[2-8], was he drinking Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [196.56.86.173] Date: Sat, 20 Dec 2003 00:36:18 -0100 Reply-To: "Fair Courtney" <gurxvuqplowy@terra.com > Content-Type: multipart/alternative; boundary="--ALT--TKEQ09634790224014" Message-Id: <WOKZZSI-0005997098858@emeritus>
Jeroen, the J_OBFUSCATING_SGML does not appear on any of my mPOP spam and was probably a limited characteristic. Sidney, I see lots of spam coming in with bogus X-Mailer identification headers such as "antonia", "alma", "iconoclasm ironic", etc. I don't think that these are the same mPOP spam that this bug report is focused on, but yes it is another thing that could be addressed. For the moment I am doing this... full LINK_2004HOSTING_NET /href=.*2004hosting/s describe LINK_2004HOSTING_NET Abusive domain - 2004hosting.net score LINK_2004HOSTING_NET 4.0 for each abusive spam domain. It works, but it's tedious labor to do this with every domain that sends spam that is not catchable by other rules. I'm sure there's a better way to do it than this. Could someone enlighten me?
Subject: Re: Lot of mPOP generated spam not matched: new rules included -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >I wonder if the spammer behind these is subscribed to this mailing list... I >just got a new one with the typical three lines of random words, but this time >it said >X-Mailer: procure >My guess is that they are sticking a random word in the X-Mailer header now. There are almost definitely multiple versions of this spamware about. There's also differences in the HTML formatting. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Exmh CVS iD8DBQE/+epmQTcbUG5Y7woRAvGkAKCI0utrxEKwr4bl3CECqmosxvXRuQCg0Mo6 qieBXY0Ak5UIbd6x/Su6n60= =4ABp -----END PGP SIGNATURE-----
Courtesy of http://www.mail-archive.com/spamassassin-talk@lists.sourceforge. net/msg28113.html This is from SA-Talk but would be nice to see it in CVS! Seems to work rather well for all this Re: [2-8] uppercase stuff - The orgininating IP is in fact a domain with IP tacked at the end of it - I too verified this to be true!: header SUBJ_RND_UC_CHAR_L Subject =~ /\%RND_UC_CHAR/ describe SUBJ_RND_UC_CHAR_L Subject contains literal RND_UC_CHAR tag score SUBJ_RND_UC_CHAR_L 5.0 header SUBJ_RND_UC_CHAR Subject =~ /^Re:\s[A-Z]{2,8},\s[a-z]+\s[a-z] +\s[a-z]+\s*$/ describe SUBJ_RND_UC_CHAR Subject fits RND_UC_CHAR pattern score SUBJ_RND_UC_CHAR 3.0 header XOIP_RND_UC_CHAR X-Originating-IP =~ /\[.*\.(com|net|org|biz). *IP\]/ describe XOIP_RND_UC_CHAR X-Originating-IP fits RND_UC_CHAR pattern score XOIP_RND_UC_CHAR 2.0
The first rule posted takes into account '. The current rule does not handle this exception. Example subjects: Re: BLQNZO, i'll cut your Re: XG, indeed? and what I am fairly sure I've see Quotes between 2 words like: Re: DGFJ, "then she" away
If you want the current version of the RND_UC_CHAR rules in comment #12, they are at http://kepler.acns.bethel.edu/~bjn/spamassassin/rnd_uc_char.cf The latest version takes punctuation into account (see comment #13), and also finds other recent samples where there are more than three random words.
FWIW, we seem to have a good set in current CVS -- I've been using them and haven't had an mPOP spam get past in a while. so, closing