SA Bugzilla – Bug 1606
A pattern that works in a Perl script does not work in a "body" rule
Last modified: 2003-05-19 05:22:14 UTC
The rule body TEST /tokenA.{1,1000}tokenB/is does not match as expected if in the body of a message "tokenA" and "tokenB" are separated by multiple NEWLINES (\n). The rule is being tested using "spamassassin -t <test.txt". The rule is needed to match for sub-strings near the beginning and near the end of message bodies which contain multiple paragraphs. Note that the same pattern when used in a Perl script _does_ work as expected if the same text is input to it. Some additional data -------------------- The relevant extract of the contents of "test.txt" looks like ... This is tokenA separated from tokenB by a blank line. ... The output from "od -c <test.txt for the same extract is: ...is tokenA separated from\n\ntokenB by a... If the contents of "test.txt" is edited to remove the blank line (ie. remove one NEWLINE) thus ... This is tokenA separated from tokenB by a blank line. ... so that the "od" command output looks like: ...is tokenA separated from\ntokenB by a... then the pattern works OK in the "body" rule.
Subject: Re: [SAdev] New: A pattern that works in a Perl script does not work in a "body" rule use a meta test instead. Huge spanning patterns in the body are (a) incredibly slow and (b) not supported really as a result. If you *really* want to use a huge-spanning match, use "rawbody" or "full" test types instead. --j.
Subject: RE: A pattern that works in a Perl script does not work in a "body" rule > -----Original Message----- > From: bugzilla-daemon@hughes-family.org > [mailto:bugzilla-daemon@hughes-family.org] > Sent: 05 March 2003 19:53 > To: Q.G.Campbell@ncl.ac.uk > Subject: [Bug 1606] A pattern that works in a Perl script > does not work in a "body" rule > > > http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1606 > > ------- Additional Comments From ajmawer@optusnet.com.au > 2003-03-05 11:52 ------- > Subject: Re: [SAdev] New: A pattern that works in a Perl > script does not work in a "body" rule > > > use a meta test instead. Huge spanning patterns in the body are > (a) incredibly slow and (b) not supported really as a result. > > If you *really* want to use a huge-spanning match, use > "rawbody" or "full" test types instead. > > --j. "j" Thanks for the reply. I am not sure what you mean by a "meta test". Can you point me at the SpamAssassin docs where I can find further info please? The spanning match I described fails in the same way whether I use a "body" or "rawbody" test. Quentin --- PHONE: +44 191 222 8209 Computing Service, University of Newcastle FAX: +44 191 222 8765 Newcastle upon Tyne, United Kingdom, NE1 7RU. ------------------------------------------------------------------------ "Any opinion expressed above is mine. The University can get its own."
Subject: Re: [SAdev] A pattern that works in a Perl script does not work in a "body" rule On Mon, Mar 10, 2003 at 01:38:25AM -0800, bugzilla-daemon@hughes-family.org wrote: > I am not sure what you mean by a "meta test". Can you point me at the > SpamAssassin docs where I can find further info please? "perldoc Mail::SpamAssassin::Conf" > The spanning match I described fails in the same way whether I use a > "body" or "rawbody" test. "body" doesn't work for you because it does things in a per-paragraph manner. "rawbody" does the same. However, "full" should work if you really want the large spanning ability That said, it's more efficient to use 2 body rules and a meta to put them together than a large regexp in a full rule. ie: body __FOUND_FOO /\bfoo\b/i body __FOUND_BAR /\bbar\b/i meta FOUND_FOO_BAR __FOUND_FOO && __FOUND_BAR
body rules are run per-paragraph, so trying to look for phrases in different paragraphs isn't going to work. try using two bodies and a meta, or a full if you really want to.