Bug 1606 - A pattern that works in a Perl script does not work in a "body" rule
Summary: A pattern that works in a Perl script does not work in a "body" rule
Status: RESOLVED INVALID
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 2.43
Hardware: PC Linux
: P5 normal
Target Milestone: 2.60
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-03-05 09:20 UTC by Quentin Campbell
Modified: 2003-05-19 05:22 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Quentin Campbell 2003-03-05 09:20:52 UTC
The rule

  body TEST /tokenA.{1,1000}tokenB/is

does not match as expected if in the body of a message "tokenA" and "tokenB" 
are separated by multiple NEWLINES (\n). The rule is being tested 
using "spamassassin -t <test.txt". 

The rule is needed to match for sub-strings near the beginning and near the end 
of message bodies which contain multiple paragraphs.

Note that the same pattern when used in a Perl script _does_ work as expected 
if the same text is input to it. 

Some additional data
--------------------

The relevant extract of the contents of "test.txt" looks like

  ...
  This is tokenA separated from

  tokenB by a blank line.
  ... 

The output from "od -c <test.txt for the same extract is:

  ...is tokenA separated from\n\ntokenB by a...

If the contents of "test.txt" is edited to remove the blank line (ie. remove 
one NEWLINE) thus

  ...
  This is tokenA separated from
  tokenB by a blank line.
  ...
 
so that the "od" command output looks like:

  ...is tokenA separated from\ntokenB by a...

then the pattern works OK in the "body" rule.
Comment 1 Antony Mawer 2003-03-05 11:52:53 UTC
Subject: Re: [SAdev]  New: A pattern that works in a Perl script does not work in a "body" rule 


use a meta test instead.   Huge spanning patterns in the body are
(a) incredibly slow and (b) not supported really as a result.

If you *really* want to use a huge-spanning match, use "rawbody" or
"full" test types instead.

--j.

Comment 2 Quentin Campbell 2003-03-10 01:38:25 UTC
Subject: RE:  A pattern that works in a Perl script does not work in a "body" rule

> -----Original Message-----
> From: bugzilla-daemon@hughes-family.org 
> [mailto:bugzilla-daemon@hughes-family.org] 
> Sent: 05 March 2003 19:53
> To: Q.G.Campbell@ncl.ac.uk
> Subject: [Bug 1606] A pattern that works in a Perl script 
> does not work in a "body" rule
> 
> 
> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1606
> 
> ------- Additional Comments From ajmawer@optusnet.com.au  
> 2003-03-05 11:52 -------
> Subject: Re: [SAdev]  New: A pattern that works in a Perl 
> script does not work in a "body" rule 
> 
> 
> use a meta test instead.   Huge spanning patterns in the body are
> (a) incredibly slow and (b) not supported really as a result.
> 
> If you *really* want to use a huge-spanning match, use 
> "rawbody" or "full" test types instead.
> 
> --j.

"j"

Thanks for the reply.

I am not sure what you mean by a "meta test". Can you point me at the
SpamAssassin docs where I can find further info please?

The spanning match I described fails in the same way whether I use a
"body" or "rawbody" test. 

Quentin
---
PHONE: +44 191 222 8209    Computing Service, University of Newcastle
FAX:   +44 191 222 8765    Newcastle upon Tyne, United Kingdom, NE1 7RU.
------------------------------------------------------------------------
"Any opinion expressed above is mine. The University can get its own." 

Comment 3 Theo Van Dinter 2003-03-10 03:40:52 UTC
Subject: Re: [SAdev]  A pattern that works in a Perl script does not work in a "body" rule

On Mon, Mar 10, 2003 at 01:38:25AM -0800, bugzilla-daemon@hughes-family.org wrote:
> I am not sure what you mean by a "meta test". Can you point me at the
> SpamAssassin docs where I can find further info please?

"perldoc Mail::SpamAssassin::Conf"

> The spanning match I described fails in the same way whether I use a
> "body" or "rawbody" test. 

"body" doesn't work for you because it does things in a per-paragraph
manner.  "rawbody" does the same.  However, "full" should work if you
really want the large spanning ability

That said, it's more efficient to use 2 body rules and a meta to put
them together than a large regexp in a full rule.  ie:

body __FOUND_FOO /\bfoo\b/i
body __FOUND_BAR /\bbar\b/i
meta FOUND_FOO_BAR	__FOUND_FOO && __FOUND_BAR

Comment 4 Theo Van Dinter 2003-05-19 13:22:14 UTC
body rules are run per-paragraph, so trying to look for phrases in different 
paragraphs isn't going to work.  try using two bodies and a meta, or a full if 
you really want to.