Bug 1158 - New eval rule to detect a base64 body when some other Content-Type is specified
Summary: New eval rule to detect a base64 body when some other Content-Type is specified
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: 2.31
Hardware: Sun Solaris
: P3 enhancement
Target Milestone: ---
Assignee: Daniel Quinlan
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-10-25 15:28 UTC by Erik
Modified: 2002-12-07 17:04 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
A sample email message that triggers this eval rule text/plain None Erik [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Erik 2002-10-25 15:28:50 UTC
I get several spam emails per week that have a base64 body, but have text/html
as the Content-Type.  I can only assume this is because Outlook or some other
common MUAs auto-decode text/html bodies that look base64 encoded.
  The result of the improper encoding is that SA can't properly apply the
various body rules, since it doesn't decode the body.  To compensate for this,
I'm using the following rule (modified from one supplied by Daniel Quinlan):

sub check_for_surprise_base64 {
  my ($self) = @_;

  return 0 if $self->{found_encoding_base64};

  my $count = 0;
  for (@{$self->{msg}->get_body()}) {
    if (/^[A-Za-z0-9\+]{60,77}$/) {
      $count++;
      return 1 if $count > 5;
    } else {
      $count = 0;
    }
  }
  return 0;
}

with these details:

body     TYPE_BASE64 eval:check_for_surprise_base64()
describe TYPE_BASE64 Body has base64 content that does not match the Content-Type
score    TYPE_BASE64 4

I can't guarantee no false positives, so a run thru a larger corpus would be
advisable.  It might also help performance to only check the first 100 lines of
the body, or similar.
Comment 1 Erik 2002-10-25 15:32:22 UTC
Created attachment 424 [details]
A sample email message that triggers this eval rule
Comment 2 Daniel Quinlan 2002-10-25 16:58:25 UTC
I'll take the bug.  One question, why did you drop "/" from the base64
regular expression?  "/" is part of the base64 character set.

Also, are all of your examples similar to the one you attached or are any
significantly different?
Comment 3 Erik 2002-10-26 00:45:10 UTC
  Go ahead and add back the "/" - I just didn't see it in my sample emails (but 
now I see it is valid).  All of the spam emails I get like this one are very 
similar - likely from the same spammer, but I can atttach more examples if you 
want more to test with or for the corpus.
Comment 4 Daniel Quinlan 2002-10-26 02:07:08 UTC
Added the rule to CVS.  I made some more improvements to the function.  It
should catch more types of this spam trick (if they exist).  The only example
I have to go by is yours.
Comment 5 Daniel Quinlan 2002-11-10 20:21:39 UTC
Results not so good.  Removing test and closing as WONTFIX.

  0.035   0.0123   0.0438    0.220   0.01    0.01  T_MIME_EXCESSIVE_BASE64
  0.115   0.0000   0.1482    0.000   0.00    0.01  T_MIME_EXCESSIVE_BASE64:daf
  0.048   0.0000   0.0515    0.000   0.00    0.01  T_MIME_EXCESSIVE_BASE64:easmith
  0.000   0.0000   0.0000    0.500   0.12    0.01  T_MIME_EXCESSIVE_BASE64:quinlan
  0.022   0.0476   0.0120    0.798   0.49    0.01  T_MIME_EXCESSIVE_BASE64:rODbegbie
  0.028   0.0000   0.0418    0.000   0.00    0.01  T_MIME_EXCESSIVE_BASE64:theo

Comment 6 Daniel Quinlan 2002-11-10 20:32:51 UTC
Errm... can't quite bring myself to close this since it seemed like a
good idea.  I will ask someone who has a good number of both nonspam and
spam hits to look into it.
Comment 7 Daniel Quinlan 2002-12-08 02:04:34 UTC
removed rule after further deliberation