SA Bugzilla – Bug 1158
New eval rule to detect a base64 body when some other Content-Type is specified
Last modified: 2002-12-07 17:04:34 UTC
I get several spam emails per week that have a base64 body, but have text/html as the Content-Type. I can only assume this is because Outlook or some other common MUAs auto-decode text/html bodies that look base64 encoded. The result of the improper encoding is that SA can't properly apply the various body rules, since it doesn't decode the body. To compensate for this, I'm using the following rule (modified from one supplied by Daniel Quinlan): sub check_for_surprise_base64 { my ($self) = @_; return 0 if $self->{found_encoding_base64}; my $count = 0; for (@{$self->{msg}->get_body()}) { if (/^[A-Za-z0-9\+]{60,77}$/) { $count++; return 1 if $count > 5; } else { $count = 0; } } return 0; } with these details: body TYPE_BASE64 eval:check_for_surprise_base64() describe TYPE_BASE64 Body has base64 content that does not match the Content-Type score TYPE_BASE64 4 I can't guarantee no false positives, so a run thru a larger corpus would be advisable. It might also help performance to only check the first 100 lines of the body, or similar.
Created attachment 424 [details] A sample email message that triggers this eval rule
I'll take the bug. One question, why did you drop "/" from the base64 regular expression? "/" is part of the base64 character set. Also, are all of your examples similar to the one you attached or are any significantly different?
Go ahead and add back the "/" - I just didn't see it in my sample emails (but now I see it is valid). All of the spam emails I get like this one are very similar - likely from the same spammer, but I can atttach more examples if you want more to test with or for the corpus.
Added the rule to CVS. I made some more improvements to the function. It should catch more types of this spam trick (if they exist). The only example I have to go by is yours.
Results not so good. Removing test and closing as WONTFIX. 0.035 0.0123 0.0438 0.220 0.01 0.01 T_MIME_EXCESSIVE_BASE64 0.115 0.0000 0.1482 0.000 0.00 0.01 T_MIME_EXCESSIVE_BASE64:daf 0.048 0.0000 0.0515 0.000 0.00 0.01 T_MIME_EXCESSIVE_BASE64:easmith 0.000 0.0000 0.0000 0.500 0.12 0.01 T_MIME_EXCESSIVE_BASE64:quinlan 0.022 0.0476 0.0120 0.798 0.49 0.01 T_MIME_EXCESSIVE_BASE64:rODbegbie 0.028 0.0000 0.0418 0.000 0.00 0.01 T_MIME_EXCESSIVE_BASE64:theo
Errm... can't quite bring myself to close this since it seemed like a good idea. I will ask someone who has a good number of both nonspam and spam hits to look into it.
removed rule after further deliberation