SA Bugzilla – Bug 3750
decode_attachments not used
Last modified: 2004-09-04 20:58:37 UTC
I am the author of MailScanner. In previous versions I have added a decode_attachments property which tells SA to decode binary attachments so I can find readable ASCII strings inside them. 3.0.0-rc3 includes the "decode_attachments" property, but then does nothing with it. If you could make this property cause the decoding of all attachments, and not just text ones, it would be great. I can't work out in 3 where to add my own patch for decoding all attachments. A pointer where to patch this would suffice, but an integrated solution would be best. Many thanks! Jules.
Subject: Re: New: decode_attachments not used On Sat, Sep 04, 2004 at 02:31:10PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > 3.0.0-rc3 includes the "decode_attachments" property, but then does nothing with > it. If you could make this property cause the decoding of all attachments, and > not just text ones, it would be great. I'm missing something. If you added in the property (grep returns no hits for "decode_attachments" in 2.6x and 3.x code), then we're not going to be able to change the behavior of the code that was added. > I can't work out in 3 where to add my own patch for decoding all attachments. A > pointer where to patch this would suffice, but an integrated solution would be best. You don't need to patch 3.0 for that. :) The internal representation of the message is a tree with all the MIME parts in different nodes. If you want to go through and get the decoded versions of all the parts that are leaf nodes (aka: not multipart/*, subparsed message/*, etc), you could use find_parts() to return the nodes you're interested in, and then call decode() on each part. Ala: my $sa = new Mail::SpamAssassin(...); my $msg = $sa->parse($msg, 1); foreach my $p ($msg->find_parts(qr/./, 1)) { my $decoded_part = $p->decode(); ... } $decoded_part is a scalar which has the decoded part in it. The Mail::SpamAssassin::Message and Mail::SpamAssassin::Message::Node POD has more information about this stuff. It's sounding more like you have questions related to updating MailScanner to work with the 3.0 API, rather than this being a bug or enhancement request. If so, we can close this ticket as WFM and take the discussion to the dev@spamassassin.apache.org list.
Okay, it must have appeared in Conf.pm after I (partially) unsuccessfully tried to apply my patch to Conf.pm. Sorry about that. I will read the docs for the new API rather more closely and see how to read individual non-text attachments, as this has changed somewhat since 2.6x. Sorry for bothering you, but thankyou for your time. Jules.
Just had another think. What I need is to be able to apply all the SA rules in the current object to non-text attachments. 2.6x only worked on text attachments (and html and variants of course). I need to apply my rules to *all* attachments, regardless of whether they are text or not. I need to work on Microsoft Word document attachments, Excel spreadsheets, all sorts of stuff like that. Will 3.x work on all binary attachments? If not, how do I pursuade it to do it? Thanks.
Subject: Re: decode_attachments not used On Sat, Sep 04, 2004 at 03:45:09PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Just had another think. What I need is to be able to apply all the SA rules in > the current object to non-text attachments. 2.6x only worked on text attachments Well, what you really want is to apply all of _your_ rules to all attachments. You definitely don't want the standard SA rules to do so. > (and html and variants of course). I need to apply my rules to *all* > attachments, regardless of whether they are text or not. I need to work on > Microsoft Word document attachments, Excel spreadsheets, all sorts of stuff like > that. > > Will 3.x work on all binary attachments? If not, how do I pursuade it to do it? Not for standard rule types (header, body, rawbody, uri), and you can't. The code very deliberatly only looks at leaf node text/* and message/* parts. It sounds like you want to write a plugin that has custom eval code to go through the message parts and apply your non-text-based rules.
Subject: Re: decode_attachments not used At 00:00 05/09/2004, you wrote: >http://bugzilla.spamassassin.org/show_bug.cgi?id=3750 > > > > > >------- Additional Comments From felicity@kluge.net 2004-09-04 16:00 ------- >Subject: Re: decode_attachments not used > >On Sat, Sep 04, 2004 at 03:45:09PM -0700, >bugzilla-daemon@bugzilla.spamassassin.org wrote: > > Just had another think. What I need is to be able to apply all the SA > rules in > > the current object to non-text attachments. 2.6x only worked on text > attachments > >Well, what you really want is to apply all of _your_ rules to all attachments. >You definitely don't want the standard SA rules to do so. Correct. What I am doing is using the SA engine as a tool working with a totally separate rulebase, which looks for specific things anywhere in the decoded message, including any readable ascii contained in Word documents and things like that. Corporates like to have "keyword spotting" in their security products so they can watch out for key project codenames and stuff like that. So I have a few rules which look for specific keywords in the entire message (including binary attachments) and quarantine messages containing them. I don't personally think keyword-spotting is very useful at all, but the corporates like it and they use it as a key differentiator between competing products. So I do it for them. > > (and html and variants of course). I need to apply my rules to *all* > > attachments, regardless of whether they are text or not. I need to work on > > Microsoft Word document attachments, Excel spreadsheets, all sorts of > stuff like > > that. > > > > Will 3.x work on all binary attachments? If not, how do I pursuade it > to do it? > >Not for standard rule types (header, body, rawbody, uri), and you can't. The >code very deliberatly only looks at leaf node text/* and message/* parts. So I see. >It sounds like you want to write a plugin that has custom eval code to go >through the message parts and apply your non-text-based rules. I may have to write it like this. Fortunately the plugin architecture looks fairly easy to use. Many thanks for your help. -- Julian Field Teaching Systems Manager jkf@ecs.soton.ac.uk Dept. of Electronics & Computer Science Tel. 023 8059 2817 University of Southampton Southampton SO17 1BJ
I have found an easy way to implement the functionality I need. Thanks to all for your help.