SA Bugzilla – Bug 1149
MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits?
Last modified: 2002-10-24 09:46:50 UTC
Hi. I think I've found out why X-GCMulti: isn't getting more hits - it's because most if not all of the spams with it are also setting off MIME_SUSPECT_NAME and getting _falsely_ labelled as viruses and excluded from mass-check. I'll add an instance as an attachment; it gets: . 2 spam.nonvirus:<200210222227542699@64.121.88.72> BULK_EMAIL,MIME_SUSPECT_NAME,REMOVE_SUBJ,T_NONSENSE_FROM_00_10 The problem would appear to be that this is getting sent with part of it as having a content type of "image/jpg" instead of "image/jpeg", so a patch is needed to EvalTests.pm; I will send it as an attachment likewise. -Allen
Created attachment 419 [details] Example of spam setting off MIME_SUSPECT_NAME without virus present
Created attachment 420 [details] Patch to EvalTests.pm to reduce MIME_SUSPECT_NAME FPs
It looks like the source of problems for you is MIME parts with a Content-Type of "application/mac-binhex". Can you run this rule on your corpus to see what the effect of excluding all application/mac-binhex messages would be? rawbody T_MACBINHEX /application\/mac-binhex/i score T_MACBINHEX 0.01 I haven't tested that rule, so it might not work, but it should be easy to write a real version of that rule into _check_attachments().
Subject: Re: [SAdev] MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? In message <20021024224703.5E30C231C1@belphegore.hughes-family.org> (on 24 October 2002 15:47:03 -0700), bugzilla-daemon@hughes-family.org (bugzilla-daemon@hughes-family.org) wrote: > http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1149 > > > > > > ------- Additional Comments From quinlan@pathname.com 2002-10-24 15:47 ------- > It looks like the source of problems for you is MIME parts with a >Content-Type of "application/mac-binhex". No. There are no such Content-Types in the mislabelled non-virus messages. application/mac-binhex has nothing to do with the reported problem. > Can you run this rule on your corpus to see what > the effect of excluding all application/mac-binhex messages would be? Why? The patch I already sent fixes the problem, and I wouldn't _want_ to exclude non-virus messages from inclusion in the spam corpus. > rawbody T_MACBINHEX /application\/mac-binhex/i > score T_MACBINHEX 0.01 > > I haven't tested that rule, so it might not work, but it should be easy to > write a real version of that rule into _check_attachments(). It's already in there. application/mac-binhex is already considered an acceptable MIME type for filenames ending in jpe?g or gif - it doesn't set off MIME_SUSPECT_NAME, and _shouldn't_. -Allen
Subject: Re: [SAdev] MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? Ed Allen Smith <easmith@beatrice.rutgers.edu> writes: > It's already in there. application/mac-binhex is already considered > an acceptable MIME type for filenames ending in jpe?g or gif - it > doesn't set off MIME_SUSPECT_NAME, and _shouldn't_. Sorry, I confused your changes with some earlier changes made to the code that I hadn't noticed until now. But I still don't understand why are you changing so many lines of code. If you just want to allow "image/jpg" (which is not a standard content-type, by the way), then just add a '?' after the "e" in jpeg. It's a one line patch. Your patch changes the function in other ways not explained by your bug report or the example. - || ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpeg@ + || ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpe?g@ Dan
Subject: Re: MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? In message <20021025001236.312512332B@belphegore.hughes-family.org> (on 24 October 2002 17:12:36 -0700), bugzilla-daemon@hughes-family.org (bugzilla-daemon@hughes-family.org) wrote: > http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1149 > > > > > > ------- Additional Comments From quinlan@pathname.com 2002-10-24 17:12 ------- > Subject: Re: [SAdev] MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason >why X-GCMulti not getting more hits? > > Ed Allen Smith <easmith@beatrice.rutgers.edu> writes: > > > It's already in there. application/mac-binhex is already considered > > an acceptable MIME type for filenames ending in jpe?g or gif - it > > doesn't set off MIME_SUSPECT_NAME, and _shouldn't_. > > Sorry, I confused your changes with some earlier changes made to the > code that I hadn't noticed until now. Understand - everyone makes mistakes. I made one in not explaining myself better, for instance. > But I still don't understand why are you changing so many lines of code. > If you just want to allow "image/jpg" (which is not a standard > content-type, by the way), Yes, but I rather doubt it's going to get interpreted as an executable file in any case. > then just add a '?' after the "e" in jpeg. It's a one line patch. Yes, but that makes it more likely that another spammer will do something else that a browser may well interpret, but spamassassin will exclude as a virus. > Your patch changes the function in other ways not explained Yes, I should have explained the patch more fully - sorry! I was more worrying about going ahead and getting out the bug report, given that the spams in question were advertisements for spamware - and spamware that is making an effort to appear legit (which it can't be, of course) and appears likely to generate the same sort of spam that may slip by SA. > by your bug report or the example. > > - || ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpeg@ > + || ($name =~ /^jpe?g$/ && $ctype !~ m@^image/p?jpe?g@ If people want to be conservative about this change, fine, but accumulating yet more lines of different regexes when they can be combined - another thing that I was trying to do - is going to slow things down further. I can submit this as a seperate bug report if you like. -Allen
> Yes, I should have explained the patch more fully - sorry! I was more > worrying about going ahead and getting out the bug report, given that the > spams in question were advertisements for spamware - and spamware that is > making an effort to appear legit (which it can't be, of course) and appears > likely to generate the same sort of spam that may slip by SA. > > If people want to be conservative about this change, fine, but accumulating > yet more lines of different regexes when they can be combined - another > thing that I was trying to do - is going to slow things down further. I can > submit this as a seperate bug report if you like. I applied the one line patch and also reformatted those lines a bit to fit in under 80 columns. Making the test a bit more generic might be a good idea, although I don't really want to combine text/html/jpeg/gif into a single paren grouping -- I think that's a bit confusing. I'd also like to see some examples of when it helps. I usually generalize this sort of thing only when the exemption list gets too long to maintain. Maybe those types are getting to that point. Anyway, let's do it as a separate bug. :-) Thanks.
Subject: Re: MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? In message <20021025004650.E245A18BD@belphegore.hughes-family.org> (on 24 October 2002 17:46:50 -0700), bugzilla-daemon@hughes-family.org (bugzilla-daemon@hughes-family.org) wrote: >> Yes, I should have explained the patch more fully - sorry! I was more >>worrying about going ahead and getting out the bug report, given that the >>spams in question were advertisements for spamware - and spamware that is >>making an effort to appear legit (which it can't be, of course) and >>appears likely to generate the same sort of spam that may slip by SA. >> If people want to be conservative about this change, fine, but >>accumulating yet more lines of different regexes when they can be combined >>- another thing that I was trying to do - is going to slow things down >>further. I can submit this as a seperate bug report if you like. > > I applied the one line patch and also reformatted those lines a bit to fit > in under 80 columns. Thank you. > Making the test a bit more generic might be a good idea, > although I don't really want to combine text/html/jpeg/gif into a single > paren grouping -- I think that's a bit confusing. Agreed - combining html with other text, and jpeg with gif, seems to make sense, but not all 4 together, certainly. > I'd also like to see some examples of when it helps. It'll help if one wishes to add png and tiff?, for instance. Overall, making a distinction between different image types doesn't seem to me to make much sense; there may be something I'm missing, of course. > I usually generalize this sort of thing only when > the exemption list gets too long to maintain. Maybe those types are getting > to that point. I'd say so, but unless someone wants to do some benchmarking, this may be a matter of individual style... > Anyway, let's do it as a separate bug. :-) No problem. > Thanks. Quite welcome. -Allen
Subject: Re: [SAdev] MICROSOFT_EXECUTABLE|MIME_SUSPECT_NAME FP - reason why X-GCMulti not getting more hits? > It'll help if one wishes to add png and tiff?, for instance. Overall, making > a distinction between different image types doesn't seem to me to make much > sense; there may be something I'm missing, of course. Good point. are there any exploits for image types we should filter for? All I know of is Postscript, which can be abused to run shell commands under some interpreters, and JPEG, which had a buffer overflow in some old decoders (now fixed). Nothing else. Given that, maybe "image/.*" is an acceptable type to allow. "application/.*" and "text/.*" is a different matter altogether! ;) --j.