Bug 5354 - sa-compile fatally hangs on execute
Summary: sa-compile fatally hangs on execute
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-compile (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Macintosh Mac OS X
: P5 normal
Target Milestone: 3.2.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-27 06:48 UTC by snowcrash+apache
Modified: 2007-03-12 06:39 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description snowcrash+apache 2007-02-27 06:48:08 UTC
installed,

  spamassassin v320/trunk, r511659

on osx 10.4.8.

  sa-compile --sudo -D

hangs at,

	[24174] dbg: zoom: NO /[-<!\s]\w{0,10}kn[bcdfhjklmnqsvwxz]\w{0,10}[->\s]/
	[24174] dbg: zoom: NO /[-<!\s]\w{0,10}eb[jqx]\w{0,10}[->\s]/
	[24174] dbg: zoom: NO /V[ij]+AGRA/i
	[24174] dbg: zoom: NO /[-<!\s]\w{0,10}hr[cjqx]\w{0,10}[->\s]/
	[24174] dbg: zoom: NO /[\s-][a-z01]{1,10}1cr[a-z01]{1,10}[,.?!]*[\s-]/i
	[24174] dbg: zoom: NO /\doo[o0] d[o0][l|][l|]ars/i
	[24174] dbg: zoom: NO /[\s-][a-z01]{1,10}[1|]ve[a-z01]{0,10}[,.?!]*[\s-]/i
	[24174] dbg: generic: giving up on that direction: brace mismatch in
'((d|t|e|e|i|r|u|x|a|b)($|' at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 516.
	[24174] dbg: zoom: NO /\b(?:(?:d|t|e|e|I|R|U|x|a|b)(?:$|\s{1,30})){10}/i
	[24174] dbg: generic: giving up on that direction: brace mismatch in
'mill(i|)on)' at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 516.
	[24174] dbg: zoom: NO
/\bU\.?S\.?(?:D\.?)?\s*(?:\$\s*)?(?:\d+,\d+,\d+|\d+\.\d+\.\d+|\d+(?:\.\d+)?\s*milli?on)/i
	[24174] dbg: generic: giving up on that direction: brace mismatch in 'f(' at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 516.
	[24174] dbg: zoom: NO /(?!\bfun?ck)\bf.?u.?c.?k/i
	[24174] dbg: zoom: NO /Pri{1,2}c[a-z]?e/i
	[24174] dbg: zoom: NO
/(?!guarantee)[gk6]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[uv\xb5\xd9\xda\xdb\xdc\xfc\xfb\xfa\xf9\xfd]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[gra\@\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xe4\xe3\xe2\xe0\xe1\xe2\xe3\xe4\xe5\xe60o]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[gra\@\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xe4\xe3\xe2\xe0\xe1\xe2\xe3\xe4\xe5\xe60o]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}/i
	[24174] dbg: zoom: NO
/\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i

requiring force/quit to exit.

in the same sa-compile exec output, i also see numerous,

	[24192] dbg: generic: giving up on that direction: brace mismatch in
'|internet|candidate|sir(s|)|madam|investor|travel(l|)er|car shopper|web)' at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 516.

seemingly non-fatal errors @ 'line 516' BodyRuleBaseExtractor.pm.
Comment 1 Justin Mason 2007-02-27 07:06:36 UTC
how long do you wait?  It can take a long time.  also, please attach the entire
debug output.
Comment 2 snowcrash+apache 2007-02-27 07:55:25 UTC
admittedly, i'd not waited beyond ~ 5 mins.

retrying, and keeping an eye on the relevant PID,

% ps -ax | grep -i sa-compile  2.34M  44.6M  85.7M 
      24284  p1  R+     9:18.02 /usr/local/perl5/bin/perl -T -w
/usr/local/spamassassin/bin/sa-compile --sudo -D

in,

% top
   ...
   24284 perl        66.3%  22:19.97   1    15   193  36.8M  4.33M  36.9M  71.9M
   ...

after 22+ mins, it's still working ...

(fwiw, this is compiling on an 'old workhorse' 500MHz PPC-G3 + 800 MB RAM,
running OSX 10.4.8 ... so, _not_ the quickest box ...)

in the meantime, here's the console output to this point -- i.e., where it's
"hung": http://rafb.net/p/CEpzgx93.html

Comment 3 Justin Mason 2007-02-27 08:08:24 UTC
wierd.  sounds like it needs testing with these add-on rulesets:

[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_adult_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_bayes_poison_nxm_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_evilnum0_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_evilnum1_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_genlsubj_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_header_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_header_eng_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_html_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_obfu_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_oem_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_random_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_specific_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_spoof_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_stocks_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_unsub_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sare_uri_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/70_sc_top200_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/72_sare_bml_post25x_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/72_sare_redirect_post3_0_0_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/99_fvgt_tripwire_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/99_sare_fraud_post25x_cf_sare_sa-update_dostech_net.cf
[24284] dbg: config: read file
/var/mail/spamassassin/updates/3.002000/bogus-virus-warnings_cf_sare_sa-update_dostech_net.cf
Comment 4 snowcrash+apache 2007-02-27 08:14:44 UTC
to that end, is there something you need _me_ to do at this point?
Comment 5 Justin Mason 2007-02-27 08:35:19 UTC
afraid not... I think it's up to me now ;)

you could try leaving it running to see if it does complete, btw.
Comment 6 snowcrash+apache 2007-02-27 08:46:03 UTC
> afraid not... I think it's up to me now ;)

ok. thx.

> you could try leaving it running to see if it does complete, btw.

i'd forgotten to kill it, so it has been.  @ 70+ minutes, it's still 'grinding'.
 i suspect it's getting nowhere ...
Comment 7 Justin Mason 2007-02-27 11:05:46 UTC
it completes here with that ruleset -- stops at that point for a long time, as
expected, but after about 1 minute it then carries on to the next stage and to
completion.  (It produces an error due to misparsing one of the REs
unfortunately, so I'll have to fix that... but that's a different bug.)

my hardware is a 1.7GHz Centrino laptop fwiw.

can you provide me with a tarball of your rules dir in /var/lib/spamassassin ,
/etc/mail/spamassassin etc.?
Comment 8 Justin Mason 2007-02-27 15:12:15 UTC
yep -- completes in 1 minute 37 secs here with the (mailed separately) tarball
of /var/lib/spmassassin .  Could you try it on another machine?
Comment 9 snowcrash+apache 2007-02-27 15:35:48 UTC
this 'hang' is consistently reproducible on four different boxes.

admittedly, similarly configure, but nonetheless ...

sounds like its environmental.  now the fun part :-/

ideas?

thanks.
Comment 10 Justin Mason 2007-02-27 16:38:55 UTC
I'd say it *must* be some additional rules -- could be perl version, alternatively.
Comment 11 snowcrash+apache 2007-02-27 16:44:36 UTC
hm. well, you should have all my rules in the tarball.

as for perl, fyi,

% perl -V
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=darwin, osvers=8.8.0, archname=darwin-thread-multi-2level
    uname='darwin server 8.8.0 darwin kernel version 8.8.0: fri sep 8 17:18:57
pdt 2006; root:xnu-792.12.6.obj~1release_ppc power macintosh powerpc '
    ...
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -pipe -Wdeclaration-after-statement
-I/usr/local/bdb/include -I/usr/local/include',
    optimize='-O3',
    cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -pipe -Wdeclaration-after-statement
-I/usr/local/bdb/include -I/usr/local/include'
    ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5363)',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.4 cc', ldflags ='-L/usr/local/bdb/lib
-L/usr/local/lib -L/usr/lib'
    libpth=/usr/local/bdb/lib /usr/local/lib /usr/lib
    libs=-ldb -lc -lm -ldl
    perllibs=-lc -lm -ldl
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-L/usr/local/bdb/lib -L/usr/local/lib -L/usr/lib
-bundle -undefined dynamic_lookup'
Characteristics of this binary (from libperl): 
  Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
                        PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES
                        USE_PERLIO USE_REENTRANT_API
  Built under darwin
  Compiled at Oct 24 2006 14:50:50
  ...

Comment 12 Loren Wilton 2007-02-28 15:07:20 UTC
FWIW, we have indications that SARE_ADULT is *very* unhappy on 3.2, getting 
hundreds of syntax errors that were never there before.  Seems to be related to 
high-bit characters in the obfuscation REs.

I vaguely recall some change that went into SA to deal with a perl bug or the 
like in that general area, but can't really recall any more than that.
Comment 13 snowcrash+apache 2007-03-06 14:48:57 UTC
(In reply to comment #8)
> yep -- completes in 1 minute 37 secs here with the (mailed separately) tarball
> of /var/lib/spmassassin .  Could you try it on another machine?

noting the 'prerelease' announcements on the list, i did a fresh co & build of
32trunk/r515318.

with the compile rule disabled in init.pre, sa --lints w/o error.

enabled, the 'hang' is as reported.

i've also, now, consistelntly reproduced this on four different boxes -- though
admittedly all osx boxes.
Comment 14 Sidney Markowitz 2007-03-06 18:34:30 UTC
On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset,  when I try
sa-compile --sudo -D using svn trunk I get to the end and then see

sudo make install
Installing
/var/lib/spamassassin/compiled/3.002000/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.bundle
Files found in blib/arch: installing files in blib/lib into architecture
dependent library tree
Installing
/tmp/.spamassassin13558foZOGptmp/ignored/man/man3/Mail::SpamAssassin::CompiledRegexps::body_0.3pm
Writing
/var/lib/spamassassin/compiled/3.002000/auto/Mail/SpamAssassin/CompiledRegexps/body_0/.packlist
Appending installation info to
/tmp/.spamassassin13558foZOGptmp/ignored/System/Library/Perl/5.8.6/darwin-thread-multi-2level/perllocal.pod
sudo rm -rf /tmp/.spamassassin13558foZOGptmp
rm: fts_read: No such file or directory
command failed! at /usr/bin/sa-compile line 236.

Google found nothing that helped regarding the "rm: fts_read: No such file or
directory" message. When I tried on the command line sudo rm -rf
/tmp/.spamassassin13558foZOGptmp it worked ok.
Comment 15 Justin Mason 2007-03-07 03:09:48 UTC
(In reply to comment #14)
> On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset,  when I try
> sa-compile --sudo -D using svn trunk I get to the end

Which is good -- I suggest you and snowcrash now compare perl/library
versions... ;)

You could also try using the additional rulesets: I've tarred them up
at http://taint.org/x/2007/bug5354-all-rulesets.tgz to save you downloading
each one.  Note though that this works fine for me with those sets on
linux (aside from the different bug in RE compilation later).

> sudo rm -rf /tmp/.spamassassin13558foZOGptmp
> rm: fts_read: No such file or directory
> command failed! at /usr/bin/sa-compile line 236.
> 
> Google found nothing that helped regarding the "rm: fts_read: No such file or
> directory" message. When I tried on the command line sudo rm -rf
> /tmp/.spamassassin13558foZOGptmp it worked ok.

http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/fts_read.3.html

     The functions fts_read() and fts_children() may fail and set errno for
     any of the errors specified for the library functions chdir(2),
     malloc(3), opendir(3), readdir(3) and stat(2).

Wierd.  The only thing I can think of is that something else
deleted the dir while the "rm" was proceeding, but you're saying
the directory was still there afterwards?

Can you trace it using strace/truss/ftrace or whatever the MacOS
equivalent is?



(PS: Loren:

> FWIW, we have indications that SARE_ADULT is *very* unhappy on 3.2, getting 
> hundreds of syntax errors that were never there before.  Seems to be related to 
> high-bit characters in the obfuscation REs.

SARE_ADULT1 or SARE_ADULT2?  a rule called "SARE_ADULT" isn't showing up
in the rules from the updates channel. Sounds like a UTF-8 problem though...
but this is a separate issue and should not be followed up in this bug.)

Comment 16 snowcrash+apache 2007-03-07 06:18:22 UTC
(In reply to comment #14)
> On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset,  when I try
> sa-compile --sudo -D using svn trunk I get to the end and then see

"... extra ruleset ..."

which ruleset are you referring to here? SARE_ADULT?
Comment 17 Sidney Markowitz 2007-03-07 12:20:19 UTC
(in reply to comment #15)

sa-compile --sudo -D --siteconfigpath=~/tmp/bug5354/tstrules -C
~/tmp/bug5354/tstrules

completed in a bit more than a minute, ending with the same strange error in the
sudo rm -rf of the temporary directory.


Here is the output of perl -V, which does show the results of having some things
installed under fink:

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
  Platform:
    osname=darwin, osvers=8.0, archname=darwin-thread-multi-2level
    uname='darwin b01.apple.com 8.0 darwin kernel version 8.3.0: mon oct 3
20:04:04 pdt 2005; root:xnu-792.6.22.obj~2release_ppc power macintosh powerpc '
    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe 
-Dldflags=-Dman3ext=3pm -Duseithreads -Duseshrplib'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-g -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -I/usr/local/include',
    optimize='-O3',
    cppflags='-no-cpp-precomp -g -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5363)',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='-L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lc
    perllibs=-ldl -lm -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-bundle -undefined dynamic_lookup -L/usr/local/lib'


Characteristics of this binary (from libperl): 
  Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES
PERL_IMPLICIT_CONTEXT
  Locally applied patches:
        23953 - fix for File::Path::rmtree CAN-2004-0452 security issue
        33990 - fix for setuid perl security issues
        SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962
  Built under darwin
  Compiled at Oct 16 2006 22:54:34
  %ENV:
    PERL5LIB="/sw/lib/perl5:/sw/lib/perl5/darwin"
  @INC:
    /sw/lib/perl5/5.8.6/darwin-thread-multi-2level
    /sw/lib/perl5/5.8.6
    /sw/lib/perl5/darwin-thread-multi-2level
    /sw/lib/perl5
    /sw/lib/perl5/darwin
    /System/Library/Perl/5.8.6/darwin-thread-multi-2level
    /System/Library/Perl/5.8.6
    /Library/Perl/5.8.6/darwin-thread-multi-2level
    /Library/Perl/5.8.6
    /Library/Perl
    /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
    /Network/Library/Perl/5.8.6
    /Network/Library/Perl
    /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
    /System/Library/Perl/Extras/5.8.6
    /Library/Perl/5.8.1
    .
Comment 18 Justin Mason 2007-03-08 07:32:01 UTC
'the same strange error in the sudo rm -rf of the temporary directory'

btw, I think this may be fixed as of r516072...
Comment 19 snowcrash+apache 2007-03-08 10:45:08 UTC
(In reply to comment #18)
fyi, after building a fresh co of r516120,

	% /usr/local/spamassassin/bin/sa-compile --sudo -D

gets somewhat further, but, eventually hangs @ (a different point ...),

	(complete detailed output -> http://rafb.net/p/1LVkgU42.html )
	...
	[1074] dbg: zoom: NO
/\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i
	[1074] dbg: zoom: extracted 867 bases, skipped 804

with cpu still chugging away at ~ 40 mins,

%  ps -ax | grep sa-compile
 1074  p1  R+     4:47.68 /usr/local/perl5/bin/perl -T -w
/usr/local/spamassassin/bin/sa-compile --sudo -D

% top
  PID COMMAND  %CPU   TIME      #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
 ...
 1074 perl     77.9%  39:19.63   1    15   193   40.5M  5.93M  24.6M  71.9M
 ...
Comment 20 Sidney Markowitz 2007-03-08 12:07:27 UTC
> I think this may be fixed as of r516072

Yes, that took care of that error. I still don't see the hang that this bug is
about.
Comment 21 snowcrash+apache 2007-03-08 12:20:52 UTC
(In reply to comment #20)
> I still don't see the hang that this bug is about.

in the output i'd referenced, i see lots of,

   "generic: giving up on regexp ..."

could that be an issue?  i'm just noting.

what info can i provide, if any, to help track this down & fix it?  it's clear
it "works for some", but, again, i see the same problem on multiple boxes, so,
i'd guess it's 'environmental'.

nothing about these boxes is particularly atypical -- the src-built perl env is
widely used with a number of opensource & commercial apps with no trouble.  as
such, i'm not convinced that anything we have here is 'broken', rather, just
different.

thanks.
Comment 22 Justin Mason 2007-03-08 13:13:08 UTC
> in the output i'd referenced, i see lots of,
> 
>    "generic: giving up on regexp ..."
> 
> could that be an issue?  i'm just noting.

no, that's fine.

> what info can i provide, if any, to help track this down & fix it?  it's clear
> it "works for some", but, again, i see the same problem on multiple boxes, so,
> i'd guess it's 'environmental'.
> 
> nothing about these boxes is particularly atypical -- the src-built perl env is
> widely used with a number of opensource & commercial apps with no trouble.  as
> such, i'm not convinced that anything we have here is 'broken', rather, just
> different.

this is the thing -- we need to figure out how it differs ;)  Have you tried on
non-MacOS-X boxes there?
Comment 23 snowcrash+apache 2007-03-08 13:25:15 UTC
> >    "generic: giving up on regexp ..."
> > 
> > could that be an issue?  i'm just noting.
> 
> no, that's fine.

ok.

> this is the thing -- we need to figure out how it differs ;)

of course :-) which is why i'm offering what info i can -- as _i_ am most
certainly not able to decipher this by myself :-/

> Have you tried on non-MacOS-X boxes there?

nope. unfortunately -- well, actually, fortunately, imho -- all our servers are osx.

we could arguably cobble up a *nix box, but as that's not an env we serve/dev
in, that's likely to introduce as many problems as it may solve.

another approach, perhaps?  since it's hanging reproducibly in the same place,
on multiple boxes, i'll propose that there's specific code/rule/etc/ it's
choking on.

b4 id'ing _what_ the problem is, can you determine _where_ the problem occurs?
if you need more/specific info to do so, i can provide it.
Comment 24 Justin Mason 2007-03-08 13:33:51 UTC
I'd suggest doing a binary search of the lines between

  dbg ("zoom: extracted $yes bases, skipped $no");

and

  info ("zoom: base extraction complete for $ruletype: yes=$yes no=$no\n");

peppering statements like

  warn "JMD";

all over that area should produce some kind of indication of where it's spending
all the time.
Comment 25 snowcrash+apache 2007-03-08 14:24:49 UTC
(In reply to comment #24)
> I'd suggest doing a binary search of the lines between

per suggestion, here's a 1st stab @ where the hang happens ...

  ...
  ?^#~\xa1`'+-]?r{1,2}/i at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 529.
  [20780] dbg: zoom: NO
/\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i
  [20780] dbg: zoom: extracted 868 bases, skipped 804
  WARN 167 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 167.
  WARN 203 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 203.
  WARN 209 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 209.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  WARN 228 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 228.
  WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
  WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
  ...
  etc etc etc
  ...

where, in "BodyRuleBaseExtractor.pm",

  ======================================
  203  warn "WARN 203";
      foreach my $set1 (@good_bases) {
      my $base1 = $set1->{base};
      my $orig1 = $set1->{orig};
      my $key1  = $set1->{name};
      next if ($base1 eq '' or $key1 eq '');
  209  warn "WARN 209";
      $conf->{base_orig}->{$ruletype}->{$key1} = $orig1;
    
      foreach my $set2 (@good_bases) {
        next if ($set1 == $set2);
  214  warn "WARN 214";
        # clobber exact dups; this can happen if a regexp outputs the 
        # same base string multiple times
        if ($set1->{name} eq $set2->{name} &&
          $set1->{base} eq $set2->{base} &&
          $set1->{orig} eq $set2->{orig})
        {
        $set2->{name} = '';       # clobber
        $set2->{base} = '';
        }
  224  warn "WARN 224";
        # skip if either already contains the other rule's name
        next if ($set1->{name} =~ /\b\Q$set2->{name}\E\b/);
        next if ($set2->{name} =~ /\b\Q$set1->{name}\E\b/);
  228  warn "WARN 228";
        my $base2 = $set2->{base};
    
        next if ($base2 eq '');
        next if (length $base1 < length $base2);
        next if ($base1 !~ /\Q$base2\E/);
  234  warn "WARN 234";
        $set1->{name} .= " ".$set2->{name};
    
        # base2 is just a subset of base1
        # dbg("zoom: subsuming '$base2' into '$base1': $set1->{name}");
      }
      }
  241  warn "WARN 241";
  ======================================


i.e., cycling between line#'s  214, 224 & 228
Comment 26 snowcrash+apache 2007-03-08 14:47:30 UTC
and, with a little 'finer grain' ...

...
[22433] dbg: zoom: extracted 868 bases, skipped 804
WARN 167 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 167.
WARN 203 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 203.
WARN 209 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 209.
WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
(snip)
WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
WARN 228 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 228.
WARN 228A at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 230.
WARN 228B at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 232.
WARN 228C at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 234.
WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
WARN 228 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 228.
WARN 228A at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 230.
WARN 228B at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 232.
WARN 228C at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 234.
WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
WARN 224 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 224.
WARN 228 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 228.
WARN 228A at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 230.
WARN 228B at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 232.
WARN 228C at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 234.
WARN 214 at
/usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
line 214.
...
etc etc etc
...



where

    ======================================================
228  warn "SNOWCRASH 228";
      my $base2 = $set2->{base};
  warn "SNOWCRASH 228A";
      next if ($base2 eq '');
  warn "SNOWCRASH 228B";
      next if (length $base1 < length $base2);
  warn "SNOWCRASH 228C";
      next if ($base1 !~ /\Q$base2\E/);
  warn "SNOWCRASH 234";
      $set1->{name} .= " ".$set2->{name};
    ======================================================


leads me to suspect, iiuc, the "where" is;

	  warn "SNOWCRASH 228C";
-->	      next if ($base1 !~ /\Q$base2\E/);
Comment 27 Justin Mason 2007-03-11 05:50:42 UTC
could you try with current svn trunk?  it has a new progress-display system...
Comment 28 snowcrash+apache 2007-03-11 19:55:59 UTC
up'ing to sa-trunk/r516950

build as usual.

then,

	date > compile_time.txt
	/usr/local/spamassassin/bin/sa-compile --sudo -D
	date >> compile_time.txt

note the new progress display.  helpful, thanks.

on an 800MHz G4, the compile now completes without error !! and,

	cat compile_time.txt

reports,

	Sun Mar 11 19:10:19 PDT 2007
	Sun Mar 11 19:41:10 PDT 2007

i.e., an ~ 31 minute compile time.


now, a few question(s):

(1) how do we verify that the compiled rules are working, and how 2 _measure_
the improved (hopefully) performance?

if we ARE using sa-compile,

(2) do compiled rules automatically take precedence over uncompiled rules? or,
must we remove the uncompiled rules from the rule path?

(3) should (must?) we run sa-compile at every rule update?  e.g., if we're
sa-update'ing once/hr, running sa-compile hourly is a bit cpu intensive.

(4) does sa-compile somehow detect changes and only compile those? or does it
recompile ALL available rules each time?

thanks.
Comment 29 Justin Mason 2007-03-12 06:39:59 UTC
'on an 800MHz G4, the compile now completes without error !!'

hooray! marking this bug closed ;)