|
SA Bugzilla – Full Text Bug Listing |
Summary: | /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Mark Martinec <Mark.Martinec> |
Component: | Libraries | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | apache, kmcgrail |
Priority: | P2 | ||
Version: | 3.4 SVN branch | ||
Target Milestone: | 4.0.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: |
Description
Mark Martinec
2012-06-04 17:26:37 UTC
I had been wondering when that email you wrote about chickenpox was going to rear it's head. Unfortunately, this issue is WAY over my regexp head. Does it only rear it's head with compiled rules? > Does it only rear it's head with compiled rules? It showed in a --lint phase of install, non-compiled rules. (compiled rules are possibly affected too, but these may already be broken due to changes in a perl debug output across versions, Bug 6649) As a quick and dirty hack I just replaced a 'ss' with 's[s]' in these three rules, so that installation/lint does not barf: Bug 6802: a hack on three J_CHICKENPOX_* rules, replacing ss with s[s] avoids interpreting "ss" as "sharp s" Sending rulesrc/sandbox/khopesh/20_chickenpox.cf Committed revision 1346064. Don't know where else we may encounter effects of these changes in perl. These three rules were just 'lucky' in using a construct which involves an additional internal check on string lengths. So far so good with 5.16.0, things appear to be working normally. > Don't know where else we may encounter effects of these changes
Should we be adding an:
use re "/aa";
in code sections which interpret regexps in rules
to avoid surprises with Unicode semantics, or deal with
specific problems as we come across?
Adding an aa modifier directly in rules would break
these regexps for versions of perl older than 5.12 (I think).
Added /aa in compile_regex() for perl >=5.14. Tested with mass-checks runs, it has no effect on rule hits or runtimes, so I deem it safe to use. Sending spamassassin-3.4/lib/Mail/SpamAssassin/Util.pm Sending trunk/lib/Mail/SpamAssassin/Util.pm Transmitting file data ..done Committing transaction... Committed revision 1863788. Rolled it back, it was buggy. https://bugzilla.redhat.com/show_bug.cgi?id=731062 So it can be only used from perl 5.16 really. Also I'm not sure if it should be enabled or disabled depending on normalize_charset. Needs a little bit more testing. One can already use /aa in rules so it's not a very big problem. body FOO m/(?<!abc|css)/aa Btw using qr//aa for all regexes speed up total runtime by ~%6 !! I was wondering why things slowed down when I removed the patch. :-) I'll play with it a bit in trunk, we can probably use it as long as the textual body is handled as bytes (could change later to utf8 for normalize_charset ?). New try. Sending trunk/lib/Mail/SpamAssassin/Util.pm Transmitting file data .done Committing transaction... Committed revision 1864964. |