Bug 3750 - decode_attachments not used
Summary: decode_attachments not used
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: unspecified
Hardware: All All
: P3 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-04 14:31 UTC by Julian Field
Modified: 2004-09-04 20:58 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Julian Field 2004-09-04 14:31:09 UTC
I am the author of MailScanner. In previous versions I have added a
decode_attachments property which tells SA to decode binary attachments so I can
find readable ASCII strings inside them.
3.0.0-rc3 includes the "decode_attachments" property, but then does nothing with
it. If you could make this property cause the decoding of all attachments, and
not just text ones, it would be great.
I can't work out in 3 where to add my own patch for decoding all attachments. A
pointer where to patch this would suffice, but an integrated solution would be best.
Many thanks!
Jules.
Comment 1 Theo Van Dinter 2004-09-04 15:33:18 UTC
Subject: Re:  New: decode_attachments not used

On Sat, Sep 04, 2004 at 02:31:10PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> 3.0.0-rc3 includes the "decode_attachments" property, but then does nothing with
> it. If you could make this property cause the decoding of all attachments, and
> not just text ones, it would be great.

I'm missing something.  If you added in the property (grep returns no hits for
"decode_attachments" in 2.6x and 3.x code), then we're not going to be able to
change the behavior of the code that was added.

> I can't work out in 3 where to add my own patch for decoding all attachments. A
> pointer where to patch this would suffice, but an integrated solution would be best.

You don't need to patch 3.0 for that. :)

The internal representation of the message is a tree with all the MIME
parts in different nodes.  If you want to go through and get the decoded
versions of all the parts that are leaf nodes (aka: not multipart/*,
subparsed message/*, etc), you could use find_parts() to return the
nodes you're interested in, and then call decode() on each part.  Ala:

my $sa = new Mail::SpamAssassin(...);
my $msg = $sa->parse($msg, 1);
foreach my $p ($msg->find_parts(qr/./, 1)) {
  my $decoded_part = $p->decode();
  ...
}

$decoded_part is a scalar which has the decoded part in it.  The
Mail::SpamAssassin::Message and Mail::SpamAssassin::Message::Node POD has more
information about this stuff.

It's sounding more like you have questions related to updating MailScanner to
work with the 3.0 API, rather than this being a bug or enhancement request.
If so, we can close this ticket as WFM and take the discussion to the
dev@spamassassin.apache.org list.

Comment 2 Julian Field 2004-09-04 15:39:36 UTC
Okay, it must have appeared in Conf.pm after I (partially) unsuccessfully tried
to apply my patch to Conf.pm. Sorry about that.
I will read the docs for the new API rather more closely and see how to read
individual non-text attachments, as this has changed somewhat since 2.6x.

Sorry for bothering you, but thankyou for your time.
Jules.
Comment 3 Julian Field 2004-09-04 15:45:08 UTC
Just had another think. What I need is to be able to apply all the SA rules in
the current object to non-text attachments. 2.6x only worked on text attachments
(and html and variants of course). I need to apply my rules to *all*
attachments, regardless of whether they are text or not. I need to work on
Microsoft Word document attachments, Excel spreadsheets, all sorts of stuff like
that.

Will 3.x work on all binary attachments? If not, how do I pursuade it to do it?

Thanks.
Comment 4 Theo Van Dinter 2004-09-04 16:00:55 UTC
Subject: Re:  decode_attachments not used

On Sat, Sep 04, 2004 at 03:45:09PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Just had another think. What I need is to be able to apply all the SA rules in
> the current object to non-text attachments. 2.6x only worked on text attachments

Well, what you really want is to apply all of _your_ rules to all attachments.
You definitely don't want the standard SA rules to do so.

> (and html and variants of course). I need to apply my rules to *all*
> attachments, regardless of whether they are text or not. I need to work on
> Microsoft Word document attachments, Excel spreadsheets, all sorts of stuff like
> that.
>
> Will 3.x work on all binary attachments? If not, how do I pursuade it to do it?

Not for standard rule types (header, body, rawbody, uri), and you can't.  The
code very deliberatly only looks at leaf node text/* and message/* parts.

It sounds like you want to write a plugin that has custom eval code to go
through the message parts and apply your non-text-based rules.

Comment 5 Julian Field 2004-09-05 04:51:26 UTC
Subject: Re:  decode_attachments not used

At 00:00 05/09/2004, you wrote:
>http://bugzilla.spamassassin.org/show_bug.cgi?id=3750
>
>
>
>
>
>------- Additional Comments From felicity@kluge.net  2004-09-04 16:00 -------
>Subject: Re:  decode_attachments not used
>
>On Sat, Sep 04, 2004 at 03:45:09PM -0700, 
>bugzilla-daemon@bugzilla.spamassassin.org wrote:
> > Just had another think. What I need is to be able to apply all the SA 
> rules in
> > the current object to non-text attachments. 2.6x only worked on text 
> attachments
>
>Well, what you really want is to apply all of _your_ rules to all attachments.
>You definitely don't want the standard SA rules to do so.

Correct. What I am doing is using the SA engine as a tool working with a 
totally separate rulebase, which looks for specific things anywhere in the 
decoded message, including any readable ascii contained in Word documents 
and things like that.
Corporates like to have "keyword spotting" in their security products so 
they can watch out for key project codenames and stuff like that. So I have 
a few rules which look for specific keywords in the entire message 
(including binary attachments) and quarantine messages containing them.

I don't personally think keyword-spotting is very useful at all, but the 
corporates like it and they use it as a key differentiator between 
competing products. So I do it for them.

> > (and html and variants of course). I need to apply my rules to *all*
> > attachments, regardless of whether they are text or not. I need to work on
> > Microsoft Word document attachments, Excel spreadsheets, all sorts of 
> stuff like
> > that.
> >
> > Will 3.x work on all binary attachments? If not, how do I pursuade it 
> to do it?
>
>Not for standard rule types (header, body, rawbody, uri), and you can't.  The
>code very deliberatly only looks at leaf node text/* and message/* parts.

So I see.

>It sounds like you want to write a plugin that has custom eval code to go
>through the message parts and apply your non-text-based rules.

I may have to write it like this. Fortunately the plugin architecture looks 
fairly easy to use.

Many thanks for your help.
--
Julian Field                Teaching Systems Manager
jkf@ecs.soton.ac.uk         Dept. of Electronics & Computer Science
Tel. 023 8059 2817          University of Southampton
                             Southampton SO17 1BJ

Comment 6 Julian Field 2004-09-05 04:58:37 UTC
I have found an easy way to implement the functionality I need. Thanks to all
for your help.