Bug 1149 - MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?
Summary: MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting ...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 major
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-10-24 11:38 UTC by Allen Smith
Modified: 2002-10-24 09:46 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Example of spam setting off MIME_SUSPECT_NAME without virus present text/plain None Allen Smith [NoCLA]
Patch to EvalTests.pm to reduce MIME_SUSPECT_NAME FPs patch None Allen Smith [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Allen Smith 2002-10-24 11:38:21 UTC
Hi. I think I've found out why X-GCMulti: isn't getting more hits - it's
because most if not all of the spams with it are also setting off
MIME_SUSPECT_NAME and getting _falsely_ labelled as viruses and excluded from
mass-check. I'll add an instance as an attachment;
it gets:

.  2 spam.nonvirus:<200210222227542699@64.121.88.72>
BULK_EMAIL,MIME_SUSPECT_NAME,REMOVE_SUBJ,T_NONSENSE_FROM_00_10

The problem would appear to be that this is getting sent with part of it as
having a content type of "image/jpg" instead of "image/jpeg", so a patch is
needed to EvalTests.pm; I will send it as an attachment likewise.

	-Allen
Comment 1 Allen Smith 2002-10-24 11:39:37 UTC
Created attachment 419 [details]
Example of spam setting off MIME_SUSPECT_NAME without virus present
Comment 2 Allen Smith 2002-10-24 11:53:42 UTC
Created attachment 420 [details]
Patch to EvalTests.pm to reduce MIME_SUSPECT_NAME FPs
Comment 3 Daniel Quinlan 2002-10-24 15:47:03 UTC
It looks like the source of problems for you is MIME parts with a Content-Type
of "application/mac-binhex".  Can you run this rule on your corpus to see what
the effect of excluding all application/mac-binhex messages would be?

rawbody T_MACBINHEX                             /application\/mac-binhex/i
score T_MACBINHEX 0.01

I haven't tested that rule, so it might not work, but it should be easy to
write a real version of that rule into _check_attachments().
Comment 4 Allen Smith 2002-10-24 16:38:05 UTC
Subject: Re: [SAdev]  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?

In message <20021024224703.5E30C231C1@belphegore.hughes-family.org> (on 24
October 2002 15:47:03 -0700), bugzilla-daemon@hughes-family.org
(bugzilla-daemon@hughes-family.org) wrote:
> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1149
> 
> 
> 
> 
> 
> ------- Additional Comments From quinlan@pathname.com  2002-10-24 15:47 -------
> It looks like the source of problems for you is MIME parts with a
>Content-Type of "application/mac-binhex".

No. There are no such Content-Types in the mislabelled non-virus
messages. application/mac-binhex has nothing to do with the reported
problem.

> Can you run this rule on your corpus to see what
> the effect of excluding all application/mac-binhex messages would be?

Why? The patch I already sent fixes the problem, and I wouldn't _want_ to
exclude non-virus messages from inclusion in the spam corpus.

> rawbody T_MACBINHEX                             /application\/mac-binhex/i
> score T_MACBINHEX 0.01
> 
> I haven't tested that rule, so it might not work, but it should be easy to
> write a real version of that rule into _check_attachments().

It's already in there. application/mac-binhex is already considered an
acceptable MIME type for filenames ending in jpe?g or gif - it doesn't set
off MIME_SUSPECT_NAME, and _shouldn't_.

	   -Allen

Comment 5 Daniel Quinlan 2002-10-24 17:12:35 UTC
Subject: Re: [SAdev]  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?

Ed Allen Smith <easmith@beatrice.rutgers.edu> writes:

> It's already in there. application/mac-binhex is already considered
> an acceptable MIME type for filenames ending in jpe?g or gif - it
> doesn't set off MIME_SUSPECT_NAME, and _shouldn't_.

Sorry, I confused your changes with some earlier changes made to the
code that I hadn't noticed until now.

But I still don't understand why are you changing so many lines of code.
If you just want to allow "image/jpg" (which is not a standard
content-type, by the way), then just add a '?' after the "e" in jpeg.
It's a one line patch.  Your patch changes the function in other ways
not explained by your bug report or the example.

-	|| ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpeg@
+	|| ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpe?g@

Dan

Comment 6 Allen Smith 2002-10-24 17:24:20 UTC
Subject: Re:  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?

In message <20021025001236.312512332B@belphegore.hughes-family.org> (on 24 October 2002 17:12:36 -0700), bugzilla-daemon@hughes-family.org (bugzilla-daemon@hughes-family.org) wrote:
> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1149
> 
> 
> 
> 
> 
> ------- Additional Comments From quinlan@pathname.com  2002-10-24 17:12 -------
> Subject: Re: [SAdev]  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason
>why X-GCMulti not getting more hits?
> 
> Ed Allen Smith <easmith@beatrice.rutgers.edu> writes:
> 
> > It's already in there. application/mac-binhex is already considered
> > an acceptable MIME type for filenames ending in jpe?g or gif - it
> > doesn't set off MIME_SUSPECT_NAME, and _shouldn't_.
> 
> Sorry, I confused your changes with some earlier changes made to the
> code that I hadn't noticed until now.

Understand - everyone makes mistakes. I made one in not explaining myself
better, for instance.

> But I still don't understand why are you changing so many lines of code.
> If you just want to allow "image/jpg" (which is not a standard
> content-type, by the way),

Yes, but I rather doubt it's going to get interpreted as an executable file
in any case.

> then just add a '?' after the "e" in jpeg. It's a one line patch.

Yes, but that makes it more likely that another spammer will do something
else that a browser may well interpret, but spamassassin will exclude as a
virus.

> Your patch changes the function in other ways not explained

Yes, I should have explained the patch more fully - sorry! I was more
worrying about going ahead and getting out the bug report, given that the
spams in question were advertisements for spamware - and spamware that is
making an effort to appear legit (which it can't be, of course) and appears
likely to generate the same sort of spam that may slip by SA.

> by your bug report or the example.
>
> -	|| ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpeg@
> +	|| ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpe?g@

If people want to be conservative about this change, fine, but accumulating
yet more lines of different regexes when they can be combined - another
thing that I was trying to do - is going to slow things down further. I can
submit this as a seperate bug report if you like.

      -Allen

Comment 7 Daniel Quinlan 2002-10-24 17:46:50 UTC
> Yes, I should have explained the patch more fully - sorry! I was more
> worrying about going ahead and getting out the bug report, given that the
> spams in question were advertisements for spamware - and spamware that is
> making an effort to appear legit (which it can't be, of course) and appears
> likely to generate the same sort of spam that may slip by SA.
>
> If people want to be conservative about this change, fine, but accumulating
> yet more lines of different regexes when they can be combined - another
> thing that I was trying to do - is going to slow things down further. I can
> submit this as a seperate bug report if you like.

I applied the one line patch and also reformatted those lines a bit to fit
in under 80 columns.  Making the test a bit more generic might be a good idea,
although I don't really want to combine text/html/jpeg/gif into a single
paren grouping -- I think that's a bit confusing.  I'd also like to see some
examples of when it helps.  I usually generalize this sort of thing only when
the exemption list gets too long to maintain.  Maybe those types are getting
to that point.  Anyway, let's do it as a separate bug.  :-)

Thanks.

Comment 8 Allen Smith 2002-10-24 18:35:06 UTC
Subject: Re:  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?

In message <20021025004650.E245A18BD@belphegore.hughes-family.org> (on 24
October 2002 17:46:50 -0700), bugzilla-daemon@hughes-family.org
(bugzilla-daemon@hughes-family.org) wrote:

>> Yes, I should have explained the patch more fully - sorry! I was more
>>worrying about going ahead and getting out the bug report, given that the
>>spams in question were advertisements for spamware - and spamware that is
>>making an effort to appear legit (which it can't be, of course) and
>>appears likely to generate the same sort of spam that may slip by SA.

>> If people want to be conservative about this change, fine, but
>>accumulating yet more lines of different regexes when they can be combined
>>- another thing that I was trying to do - is going to slow things down
>>further. I can submit this as a seperate bug report if you like.
> 
> I applied the one line patch and also reformatted those lines a bit to fit
> in under 80 columns.

Thank you.

> Making the test a bit more generic might be a good idea,
> although I don't really want to combine text/html/jpeg/gif into a single
> paren grouping -- I think that's a bit confusing.

Agreed - combining html with other text, and jpeg with gif, seems to make
sense, but not all 4 together, certainly.

> I'd also like to see some examples of when it helps.

It'll help if one wishes to add png and tiff?, for instance. Overall, making
a distinction between different image types doesn't seem to me to make much
sense; there may be something I'm missing, of course.

> I usually generalize this sort of thing only when
> the exemption list gets too long to maintain.  Maybe those types are getting
> to that point.

I'd say so, but unless someone wants to do some benchmarking, this may be a
matter of individual style...

> Anyway, let's do it as a separate bug.  :-)

No problem.

> Thanks.

Quite welcome.

      -Allen

Comment 9 Justin Mason 2002-10-25 03:49:04 UTC
Subject: Re: [SAdev]  MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? 


> It'll help if one wishes to add png and tiff?, for instance. Overall, making
> a distinction between different image types doesn't seem to me to make much
> sense; there may be something I'm missing, of course.

Good point.  are there any exploits for image types we should filter for?

All I know of is Postscript, which can be abused to run shell commands
under some interpreters, and JPEG, which had a buffer overflow in some
old decoders (now fixed).  Nothing else.

Given that, maybe "image/.*" is an acceptable type to allow.
"application/.*" and "text/.*" is a different matter altogether! ;)

--j.