Bug 3692 - [review] check_dcc() code shortcut with header-RE doesn't work
Summary: [review] check_dcc() code shortcut with header-RE doesn't work
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P3 normal
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL: http://article.gmane.org/gmane.mail.s...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-17 07:57 UTC by Malte S. Stretz
Modified: 2004-08-17 06:02 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
suggested patch patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Malte S. Stretz 2004-08-17 07:57:29 UTC
From the mailinglist (archive see URL): 
 
On Tuesday 17 August 2004 04:29 CET Theo Van Dinter wrote: 
> On Tue, Aug 17, 2004 at 12:33:28AM +0200, Malte S. Stretz wrote: 
> > While I tracked another possible bug (see [1]), I had a look how 
> > check_dcc works.  And was completely puzzled by the code. 
> 
> Heh. 
> 
> > check_dcc() calls 
> >   $self->get('X-DCC-(?:[^:]+-)?Metrics') 
> > to shortcut some ocde path.  That's obviously intended to be a RE.  But 
> > on the whole path through the code I can't find the point where the 
> > header is matched via a RE. 
> 
> Well, now that you mention it...  PMS::get() doesn't accept an RE. 
> 
> > Follow me through the code, maybe you can enlighten me: 
> > 2. PerMsgStatus.pm:1175: get('X-DCC-(?:[^:]+-)?Metrics', undef) 
> 
> Yeah, that's not going to work. 
> 
> > -> PerMsgStatus.pm:1227: calls Message::Node::get_header() 
> 
> doesn't do an RE either. 
> 
> > Ok, so where's the RE applied?  Even if other code paths were taken I 
> > can't find any point where the string is made an RE and I can't believe 
> > that the mail parser puts the "Header" check_dcc() wants into the 
> > cache-hash. 
> 
> Yeah, so we ought to replace: 
> 
>   $_ = $self->get('X-DCC-(?:[^:]+-)?Metrics'); 
>   return 1 if /bulk/; 
> 
> with: 
> 
>   return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, 
> @{$self->{msg}->get_all_headers()}); 
 
On Tuesday 17 August 2004 16:04 CET Theo Van Dinter wrote: 
> On Mon, Aug 16, 2004 at 08:56:58PM -0700, Dan Quinlan wrote: 
> > >   return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, 
> > > @{$self->{msg}->= get_all_headers()}); 
> > 
> > If we can expand out the list of headers, we should just go with that. 
> > Fetching the entire list of headers is definitely inefficient unless we 
> > really need it. 
> 
> Sure.  I was reproducing the initial behavior, but if we know the list 
> of headers, that'd be better.  The issue, however, is: 
> 
> [...] 
>      DCC SMTP header lines are of the form: 
> 
>      X-DCC-brand-Metrics: chost server-ID; bulk cknm1=count cknm2=count 
> ... where 
>         brand   is the "brand name" of the DCC server, such as 
> "RHYOLITE". [...] 
> 
> So "brand" is open to how things are configured and which DCC server is 
> being used.  I think we have to do the get_all_headers() call there.
Comment 1 Theo Van Dinter 2004-08-17 08:56:57 UTC
d'oh.  there shouldn't be an array dereference bit around get_all_headers... <sigh>

I'll be putting up a proper (and tested :)) patch shortly.
Comment 2 Theo Van Dinter 2004-08-17 09:15:32 UTC
Created attachment 2265 [details]
suggested patch

I did a very quick manual test of this and the new version works, so that's a
plus.  :)
Comment 3 Theo Van Dinter 2004-08-17 09:16:22 UTC
it's not a major issue, but we may as well fix it before 3.0.0 since it's known.
Comment 4 Malte S. Stretz 2004-08-17 09:48:01 UTC
What do you think if we made it possible to send a real RE to those routines?  
It would look like 
  $self->get(qr/X-DCC-(?:[^:]+-)?Metrics/) 
 
The routines (probably hahnded through till M::N::headers()) then check  
  if ref ($request) eq "Regexp" 
and grep the wanted headers from the ->{headers} hash. 
 
We could also introduce another suffix :re which would be more consistent with 
the :raw etc. ones though a bit more computation overhead.  It would look like 
  $self->get('X-DCC-(?:[^:]+-)?Metrics:re') 
 
I prefer the first one. 
Comment 5 Theo Van Dinter 2004-08-17 11:53:04 UTC
Subject: Re:  check_dcc() code shortcut with header-RE doesn't work

On Tue, Aug 17, 2004 at 09:48:02AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> What do you think if we made it possible to send a real RE to those routines?  
> It would look like 
>   $self->get(qr/X-DCC-(?:[^:]+-)?Metrics/) 

I think it would end up slowing down the most common case which is
grabbing specific headers.

In a quick look at the code...  The DCC thing is is the only line I can
find calling "get()" with an RE.  There are 3 calls to get_all_headers()
right now, none of them do the "match multiple headers" type thing.

Comment 6 Theo Van Dinter 2004-08-17 13:10:21 UTC
changing to review status.
Comment 7 Malte S. Stretz 2004-08-17 13:17:59 UTC
I doubt that the two if()s necessary will slow down checking significantly.  
OTOH ain't I sure if not fetching all headers in the first place will make any 
difference either. 
 
Anyway, +1 on the patch as it at least makes the stuff work ;) 
Comment 8 Daniel Quinlan 2004-08-17 13:23:31 UTC
Subject: Re:  check_dcc() code shortcut with header-RE doesn't work

>> It would look like 
>>   $self->get(qr/X-DCC-(?:[^:]+-)?Metrics/) 

> I think it would end up slowing down the most common case which is
> grabbing specific headers.

I agree with Theo.  Keep the common case fast.  I don't mind doing a
get_all_headers() with DCC too much.  DCC is already dirt slow.

Daniel

Comment 9 Justin Mason 2004-08-17 13:35:09 UTC
+1
Comment 10 Daniel Quinlan 2004-08-17 13:40:21 UTC
+1

minor suggestion: I'd replace the + with {1,80}
Comment 11 Theo Van Dinter 2004-08-17 14:02:24 UTC
committed with quinlan's suggested + -> {1,80} change.  r36540
Comment 12 Theo Van Dinter 2004-08-17 14:02:46 UTC
maybe I'll close the ticket this time too. <G>