Bug 3013 - fp: Opengroupware mailer
Summary: fp: Opengroupware mailer
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-02-05 12:57 UTC by Steve Sether
Modified: 2005-03-11 09:04 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Sether 2004-02-05 12:57:11 UTC
Spamassassin has two rules, RATWARE_HASH_2, and RATWARE_HASH_2_V2 which are
triggered by  X-Mailer: headers longer than 16, and 14 characters that are one
of (A-Z,a-z, 0-9,.,_).  The opengroupware mailer uses the following header:

X-Mailer: OpenGroupware.org

which has 17 of the above characters in its tag, triggering both rules. 
Opengroupware generates legitimate email, is not a spam mailer, and shouldn't be
rated with a relatively high spam score.

The opengroupware maintainers have been contacted (see
http://bugzilla.opengroupware.org/bugzilla/show_bug.cgi?id=607), but
understandably feel that the header is completely legitimate and this problem
should be fixed in SpamAssassin.

This is a re-occuring problem, as seen by
http://bugzilla.spamassassin.org/show_bug.cgi?id=2108.  Can these rules have a
list of exceptions added to them?  I'm not a regular expression expert, so I
don't know if a maintainable exception list is implementable.
Comment 1 Theo Van Dinter 2004-02-05 14:19:17 UTC
I was just noticing that my results for those rules kind of suck:

  0.090   0.1192   0.0235    0.835   1.00    1.00  RATWARE_HASH_2_V2
  0.067   0.0867   0.0235    0.787   0.79    1.00  RATWARE_HASH_2

So I don't know if this is really a problem or not.
Comment 2 Theo Van Dinter 2004-02-05 18:12:38 UTC
for example, my FPs include:

X-Mailer: ClassifiedVentures
X-Mailer: com.reunion.site.mail
X-Mailer: com.snowball.mail
X-Mailer: webmail.delfi.lt


none of my valid hits use '_' or '.', which would solve the bottom three there as well as opengroupware 
and the other ticket mentioned before.  ClassifiedVentures is still a problem, but I can't think of a way 
to see that as different as 'ckGmqXGFWNfaNAxRse' really...

fyi, if I remove the underscore and period:

  0.080   0.1087   0.0153    0.877   1.00    0.01  T_RATWARE_HASH_2_V2
  0.061   0.0814   0.0153    0.842   0.86    0.01  T_RATWARE_HASH_2
  0.090   0.1192   0.0235    0.835   0.83    2.67  RATWARE_HASH_2_V2
  0.067   0.0867   0.0235    0.787   0.66    0.00  RATWARE_HASH_2

so for me they're net plus.  the fps are the same for v2 and non: "ClassifiedVentures"
Comment 3 Steve Sether 2004-02-06 11:39:12 UTC
Removing the . and _ sounds like a decent solution to me (though my experience
with spamassassin is limited to the past week).  With web addresses so common, .
has become a common delimiter to seperate words.  For instance, Java uses the
.tld.domain.project.subproject naming scheme for classes.   It's no mistake that
many of the X-Mailer: headers use internet domains as their identifiers.

I think you're right that there's no simple way of distiguishing
'ckGmqXGFWNfaNAxRse' from ClassifiedVentures using regular expressions.  
Assuming what you're really looking for is either randomly generated X-Mailer
strings (or some ratware guy just hitting keys on his keyboard), you might just
look at the "information content" of the string.  'ckGmqXGFWNfaNAxRse' is a
random string of upper/lowercase  text.  Where 'ClassifiedVentures' is not
random at all.  The random string contains more "information", where the
non-random one contains less.  A simple test might be trying to compress the
string.  If it's very compressible it has low information content, and wasn't
generated randomly.  If it's not very compressible it has high information
content, and is probbably randomly generated.  

Slightly off topic, but could this kind of test could be applied to other parts
of a message too?  I've noticed a lot of spam having random strings inserted in
them in an attempt to get past filters.  If you could identify these strings as
random, you could add to a mails spam rating.
Comment 4 Daniel Quinlan 2004-08-27 17:18:11 UTC
more accuracy and performance bugs going to 3.1.0 milestone
Comment 5 Justin Mason 2005-02-07 21:42:44 UTC
testing now... should have results in a day or so.
Comment 6 Justin Mason 2005-03-11 18:04:29 UTC
ok, current versions ignore X-Mailer lines with a "."