SA Bugzilla – Bug 5354
sa-compile fatally hangs on execute
Last modified: 2007-03-12 06:39:59 UTC
installed, spamassassin v320/trunk, r511659 on osx 10.4.8. sa-compile --sudo -D hangs at, [24174] dbg: zoom: NO /[-<!\s]\w{0,10}kn[bcdfhjklmnqsvwxz]\w{0,10}[->\s]/ [24174] dbg: zoom: NO /[-<!\s]\w{0,10}eb[jqx]\w{0,10}[->\s]/ [24174] dbg: zoom: NO /V[ij]+AGRA/i [24174] dbg: zoom: NO /[-<!\s]\w{0,10}hr[cjqx]\w{0,10}[->\s]/ [24174] dbg: zoom: NO /[\s-][a-z01]{1,10}1cr[a-z01]{1,10}[,.?!]*[\s-]/i [24174] dbg: zoom: NO /\doo[o0] d[o0][l|][l|]ars/i [24174] dbg: zoom: NO /[\s-][a-z01]{1,10}[1|]ve[a-z01]{0,10}[,.?!]*[\s-]/i [24174] dbg: generic: giving up on that direction: brace mismatch in '((d|t|e|e|i|r|u|x|a|b)($|' at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 516. [24174] dbg: zoom: NO /\b(?:(?:d|t|e|e|I|R|U|x|a|b)(?:$|\s{1,30})){10}/i [24174] dbg: generic: giving up on that direction: brace mismatch in 'mill(i|)on)' at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 516. [24174] dbg: zoom: NO /\bU\.?S\.?(?:D\.?)?\s*(?:\$\s*)?(?:\d+,\d+,\d+|\d+\.\d+\.\d+|\d+(?:\.\d+)?\s*milli?on)/i [24174] dbg: generic: giving up on that direction: brace mismatch in 'f(' at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 516. [24174] dbg: zoom: NO /(?!\bfun?ck)\bf.?u.?c.?k/i [24174] dbg: zoom: NO /Pri{1,2}c[a-z]?e/i [24174] dbg: zoom: NO /(?!guarantee)[gk6]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[uv\xb5\xd9\xda\xdb\xdc\xfc\xfb\xfa\xf9\xfd]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[gra\@\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xe4\xe3\xe2\xe0\xe1\xe2\xe3\xe4\xe5\xe60o]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[gra\@\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xe4\xe3\xe2\xe0\xe1\xe2\xe3\xe4\xe5\xe60o]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}/i [24174] dbg: zoom: NO /\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i requiring force/quit to exit. in the same sa-compile exec output, i also see numerous, [24192] dbg: generic: giving up on that direction: brace mismatch in '|internet|candidate|sir(s|)|madam|investor|travel(l|)er|car shopper|web)' at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 516. seemingly non-fatal errors @ 'line 516' BodyRuleBaseExtractor.pm.
how long do you wait? It can take a long time. also, please attach the entire debug output.
admittedly, i'd not waited beyond ~ 5 mins. retrying, and keeping an eye on the relevant PID, % ps -ax | grep -i sa-compile 2.34M 44.6M 85.7M 24284 p1 R+ 9:18.02 /usr/local/perl5/bin/perl -T -w /usr/local/spamassassin/bin/sa-compile --sudo -D in, % top ... 24284 perl 66.3% 22:19.97 1 15 193 36.8M 4.33M 36.9M 71.9M ... after 22+ mins, it's still working ... (fwiw, this is compiling on an 'old workhorse' 500MHz PPC-G3 + 800 MB RAM, running OSX 10.4.8 ... so, _not_ the quickest box ...) in the meantime, here's the console output to this point -- i.e., where it's "hung": http://rafb.net/p/CEpzgx93.html
wierd. sounds like it needs testing with these add-on rulesets: [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_adult_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_bayes_poison_nxm_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_evilnum0_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_evilnum1_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_genlsubj_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_header_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_header_eng_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_html_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_obfu_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_oem_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_random_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_specific_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_spoof_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_stocks_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_unsub_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sare_uri_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/70_sc_top200_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/72_sare_bml_post25x_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/72_sare_redirect_post3_0_0_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/99_fvgt_tripwire_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/99_sare_fraud_post25x_cf_sare_sa-update_dostech_net.cf [24284] dbg: config: read file /var/mail/spamassassin/updates/3.002000/bogus-virus-warnings_cf_sare_sa-update_dostech_net.cf
to that end, is there something you need _me_ to do at this point?
afraid not... I think it's up to me now ;) you could try leaving it running to see if it does complete, btw.
> afraid not... I think it's up to me now ;) ok. thx. > you could try leaving it running to see if it does complete, btw. i'd forgotten to kill it, so it has been. @ 70+ minutes, it's still 'grinding'. i suspect it's getting nowhere ...
it completes here with that ruleset -- stops at that point for a long time, as expected, but after about 1 minute it then carries on to the next stage and to completion. (It produces an error due to misparsing one of the REs unfortunately, so I'll have to fix that... but that's a different bug.) my hardware is a 1.7GHz Centrino laptop fwiw. can you provide me with a tarball of your rules dir in /var/lib/spamassassin , /etc/mail/spamassassin etc.?
yep -- completes in 1 minute 37 secs here with the (mailed separately) tarball of /var/lib/spmassassin . Could you try it on another machine?
this 'hang' is consistently reproducible on four different boxes. admittedly, similarly configure, but nonetheless ... sounds like its environmental. now the fun part :-/ ideas? thanks.
I'd say it *must* be some additional rules -- could be perl version, alternatively.
hm. well, you should have all my rules in the tarball. as for perl, fyi, % perl -V Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=darwin, osvers=8.8.0, archname=darwin-thread-multi-2level uname='darwin server 8.8.0 darwin kernel version 8.8.0: fri sep 8 17:18:57 pdt 2006; root:xnu-792.12.6.obj~1release_ppc power macintosh powerpc ' ... hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/bdb/include -I/usr/local/include', optimize='-O3', cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/bdb/include -I/usr/local/include' ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5363)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='env MACOSX_DEPLOYMENT_TARGET=10.4 cc', ldflags ='-L/usr/local/bdb/lib -L/usr/local/lib -L/usr/lib' libpth=/usr/local/bdb/lib /usr/local/lib /usr/lib libs=-ldb -lc -lm -ldl perllibs=-lc -lm -ldl libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-L/usr/local/bdb/lib -L/usr/local/lib -L/usr/lib -bundle -undefined dynamic_lookup' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API Built under darwin Compiled at Oct 24 2006 14:50:50 ...
FWIW, we have indications that SARE_ADULT is *very* unhappy on 3.2, getting hundreds of syntax errors that were never there before. Seems to be related to high-bit characters in the obfuscation REs. I vaguely recall some change that went into SA to deal with a perl bug or the like in that general area, but can't really recall any more than that.
(In reply to comment #8) > yep -- completes in 1 minute 37 secs here with the (mailed separately) tarball > of /var/lib/spmassassin . Could you try it on another machine? noting the 'prerelease' announcements on the list, i did a fresh co & build of 32trunk/r515318. with the compile rule disabled in init.pre, sa --lints w/o error. enabled, the 'hang' is as reported. i've also, now, consistelntly reproduced this on four different boxes -- though admittedly all osx boxes.
On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset, when I try sa-compile --sudo -D using svn trunk I get to the end and then see sudo make install Installing /var/lib/spamassassin/compiled/3.002000/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.bundle Files found in blib/arch: installing files in blib/lib into architecture dependent library tree Installing /tmp/.spamassassin13558foZOGptmp/ignored/man/man3/Mail::SpamAssassin::CompiledRegexps::body_0.3pm Writing /var/lib/spamassassin/compiled/3.002000/auto/Mail/SpamAssassin/CompiledRegexps/body_0/.packlist Appending installation info to /tmp/.spamassassin13558foZOGptmp/ignored/System/Library/Perl/5.8.6/darwin-thread-multi-2level/perllocal.pod sudo rm -rf /tmp/.spamassassin13558foZOGptmp rm: fts_read: No such file or directory command failed! at /usr/bin/sa-compile line 236. Google found nothing that helped regarding the "rm: fts_read: No such file or directory" message. When I tried on the command line sudo rm -rf /tmp/.spamassassin13558foZOGptmp it worked ok.
(In reply to comment #14) > On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset, when I try > sa-compile --sudo -D using svn trunk I get to the end Which is good -- I suggest you and snowcrash now compare perl/library versions... ;) You could also try using the additional rulesets: I've tarred them up at http://taint.org/x/2007/bug5354-all-rulesets.tgz to save you downloading each one. Note though that this works fine for me with those sets on linux (aside from the different bug in RE compilation later). > sudo rm -rf /tmp/.spamassassin13558foZOGptmp > rm: fts_read: No such file or directory > command failed! at /usr/bin/sa-compile line 236. > > Google found nothing that helped regarding the "rm: fts_read: No such file or > directory" message. When I tried on the command line sudo rm -rf > /tmp/.spamassassin13558foZOGptmp it worked ok. http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/fts_read.3.html The functions fts_read() and fts_children() may fail and set errno for any of the errors specified for the library functions chdir(2), malloc(3), opendir(3), readdir(3) and stat(2). Wierd. The only thing I can think of is that something else deleted the dir while the "rm" was proceeding, but you're saying the directory was still there afterwards? Can you trace it using strace/truss/ftrace or whatever the MacOS equivalent is? (PS: Loren: > FWIW, we have indications that SARE_ADULT is *very* unhappy on 3.2, getting > hundreds of syntax errors that were never there before. Seems to be related to > high-bit characters in the obfuscation REs. SARE_ADULT1 or SARE_ADULT2? a rule called "SARE_ADULT" isn't showing up in the rules from the updates channel. Sounds like a UTF-8 problem though... but this is a separate issue and should not be followed up in this bug.)
(In reply to comment #14) > On my MacOS 10.4.8 on an Intel MacBook, without the extra ruleset, when I try > sa-compile --sudo -D using svn trunk I get to the end and then see "... extra ruleset ..." which ruleset are you referring to here? SARE_ADULT?
(in reply to comment #15) sa-compile --sudo -D --siteconfigpath=~/tmp/bug5354/tstrules -C ~/tmp/bug5354/tstrules completed in a bit more than a minute, ending with the same strange error in the sudo rm -rf of the temporary directory. Here is the output of perl -V, which does show the results of having some things installed under fink: Summary of my perl5 (revision 5 version 8 subversion 6) configuration: Platform: osname=darwin, osvers=8.0, archname=darwin-thread-multi-2level uname='darwin b01.apple.com 8.0 darwin kernel version 8.3.0: mon oct 3 20:04:04 pdt 2005; root:xnu-792.6.22.obj~2release_ppc power macintosh powerpc ' config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags=-Dman3ext=3pm -Duseithreads -Duseshrplib' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-g -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include', optimize='-O3', cppflags='-no-cpp-precomp -g -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5363)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='-L/usr/local/lib' libpth=/usr/local/lib /usr/lib libs=-ldbm -ldl -lm -lc perllibs=-ldl -lm -lc libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-bundle -undefined dynamic_lookup -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Locally applied patches: 23953 - fix for File::Path::rmtree CAN-2004-0452 security issue 33990 - fix for setuid perl security issues SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962 Built under darwin Compiled at Oct 16 2006 22:54:34 %ENV: PERL5LIB="/sw/lib/perl5:/sw/lib/perl5/darwin" @INC: /sw/lib/perl5/5.8.6/darwin-thread-multi-2level /sw/lib/perl5/5.8.6 /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .
'the same strange error in the sudo rm -rf of the temporary directory' btw, I think this may be fixed as of r516072...
(In reply to comment #18) fyi, after building a fresh co of r516120, % /usr/local/spamassassin/bin/sa-compile --sudo -D gets somewhat further, but, eventually hangs @ (a different point ...), (complete detailed output -> http://rafb.net/p/1LVkgU42.html ) ... [1074] dbg: zoom: NO /\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i [1074] dbg: zoom: extracted 867 bases, skipped 804 with cpu still chugging away at ~ 40 mins, % ps -ax | grep sa-compile 1074 p1 R+ 4:47.68 /usr/local/perl5/bin/perl -T -w /usr/local/spamassassin/bin/sa-compile --sudo -D % top PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE ... 1074 perl 77.9% 39:19.63 1 15 193 40.5M 5.93M 24.6M 71.9M ...
> I think this may be fixed as of r516072 Yes, that took care of that error. I still don't see the hang that this bug is about.
(In reply to comment #20) > I still don't see the hang that this bug is about. in the output i'd referenced, i see lots of, "generic: giving up on regexp ..." could that be an issue? i'm just noting. what info can i provide, if any, to help track this down & fix it? it's clear it "works for some", but, again, i see the same problem on multiple boxes, so, i'd guess it's 'environmental'. nothing about these boxes is particularly atypical -- the src-built perl env is widely used with a number of opensource & commercial apps with no trouble. as such, i'm not convinced that anything we have here is 'broken', rather, just different. thanks.
> in the output i'd referenced, i see lots of, > > "generic: giving up on regexp ..." > > could that be an issue? i'm just noting. no, that's fine. > what info can i provide, if any, to help track this down & fix it? it's clear > it "works for some", but, again, i see the same problem on multiple boxes, so, > i'd guess it's 'environmental'. > > nothing about these boxes is particularly atypical -- the src-built perl env is > widely used with a number of opensource & commercial apps with no trouble. as > such, i'm not convinced that anything we have here is 'broken', rather, just > different. this is the thing -- we need to figure out how it differs ;) Have you tried on non-MacOS-X boxes there?
> > "generic: giving up on regexp ..." > > > > could that be an issue? i'm just noting. > > no, that's fine. ok. > this is the thing -- we need to figure out how it differs ;) of course :-) which is why i'm offering what info i can -- as _i_ am most certainly not able to decipher this by myself :-/ > Have you tried on non-MacOS-X boxes there? nope. unfortunately -- well, actually, fortunately, imho -- all our servers are osx. we could arguably cobble up a *nix box, but as that's not an env we serve/dev in, that's likely to introduce as many problems as it may solve. another approach, perhaps? since it's hanging reproducibly in the same place, on multiple boxes, i'll propose that there's specific code/rule/etc/ it's choking on. b4 id'ing _what_ the problem is, can you determine _where_ the problem occurs? if you need more/specific info to do so, i can provide it.
I'd suggest doing a binary search of the lines between dbg ("zoom: extracted $yes bases, skipped $no"); and info ("zoom: base extraction complete for $ruletype: yes=$yes no=$no\n"); peppering statements like warn "JMD"; all over that area should produce some kind of indication of where it's spending all the time.
(In reply to comment #24) > I'd suggest doing a binary search of the lines between per suggestion, here's a 1st stab @ where the hang happens ... ... ?^#~\xa1`'+-]?r{1,2}/i at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 529. [20780] dbg: zoom: NO /\b(?!investor)[ilt|!1y?\xcc\xcd\xce\xcf\xec\xed\xee\xef]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[n\xd1\xf1]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?(?:[vu]|\\\/){1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[e3\xc8\xc9\xca\xcb\xe8\xe9\xea\xeb\xa4]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[sz5\xa6\xa7]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[t|]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?[go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8]{1,2}[\s\d_*\$\%(),.:;?!}{\[\]|\/?^#~\xa1`'+-]?r{1,2}/i [20780] dbg: zoom: extracted 868 bases, skipped 804 WARN 167 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 167. WARN 203 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 203. WARN 209 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 209. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 228 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 228. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. ... etc etc etc ... where, in "BodyRuleBaseExtractor.pm", ====================================== 203 warn "WARN 203"; foreach my $set1 (@good_bases) { my $base1 = $set1->{base}; my $orig1 = $set1->{orig}; my $key1 = $set1->{name}; next if ($base1 eq '' or $key1 eq ''); 209 warn "WARN 209"; $conf->{base_orig}->{$ruletype}->{$key1} = $orig1; foreach my $set2 (@good_bases) { next if ($set1 == $set2); 214 warn "WARN 214"; # clobber exact dups; this can happen if a regexp outputs the # same base string multiple times if ($set1->{name} eq $set2->{name} && $set1->{base} eq $set2->{base} && $set1->{orig} eq $set2->{orig}) { $set2->{name} = ''; # clobber $set2->{base} = ''; } 224 warn "WARN 224"; # skip if either already contains the other rule's name next if ($set1->{name} =~ /\b\Q$set2->{name}\E\b/); next if ($set2->{name} =~ /\b\Q$set1->{name}\E\b/); 228 warn "WARN 228"; my $base2 = $set2->{base}; next if ($base2 eq ''); next if (length $base1 < length $base2); next if ($base1 !~ /\Q$base2\E/); 234 warn "WARN 234"; $set1->{name} .= " ".$set2->{name}; # base2 is just a subset of base1 # dbg("zoom: subsuming '$base2' into '$base1': $set1->{name}"); } } 241 warn "WARN 241"; ====================================== i.e., cycling between line#'s 214, 224 & 228
and, with a little 'finer grain' ... ... [22433] dbg: zoom: extracted 868 bases, skipped 804 WARN 167 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 167. WARN 203 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 203. WARN 209 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 209. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. (snip) WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 228 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 228. WARN 228A at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 230. WARN 228B at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 232. WARN 228C at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 234. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 228 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 228. WARN 228A at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 230. WARN 228B at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 232. WARN 228C at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 234. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. WARN 224 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 224. WARN 228 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 228. WARN 228A at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 230. WARN 228B at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 232. WARN 228C at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 234. WARN 214 at /usr/local/lib/perl/sitelib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 214. ... etc etc etc ... where ====================================================== 228 warn "SNOWCRASH 228"; my $base2 = $set2->{base}; warn "SNOWCRASH 228A"; next if ($base2 eq ''); warn "SNOWCRASH 228B"; next if (length $base1 < length $base2); warn "SNOWCRASH 228C"; next if ($base1 !~ /\Q$base2\E/); warn "SNOWCRASH 234"; $set1->{name} .= " ".$set2->{name}; ====================================================== leads me to suspect, iiuc, the "where" is; warn "SNOWCRASH 228C"; --> next if ($base1 !~ /\Q$base2\E/);
could you try with current svn trunk? it has a new progress-display system...
up'ing to sa-trunk/r516950 build as usual. then, date > compile_time.txt /usr/local/spamassassin/bin/sa-compile --sudo -D date >> compile_time.txt note the new progress display. helpful, thanks. on an 800MHz G4, the compile now completes without error !! and, cat compile_time.txt reports, Sun Mar 11 19:10:19 PDT 2007 Sun Mar 11 19:41:10 PDT 2007 i.e., an ~ 31 minute compile time. now, a few question(s): (1) how do we verify that the compiled rules are working, and how 2 _measure_ the improved (hopefully) performance? if we ARE using sa-compile, (2) do compiled rules automatically take precedence over uncompiled rules? or, must we remove the uncompiled rules from the rule path? (3) should (must?) we run sa-compile at every rule update? e.g., if we're sa-update'ing once/hr, running sa-compile hourly is a bit cpu intensive. (4) does sa-compile somehow detect changes and only compile those? or does it recompile ALL available rules each time? thanks.
'on an 800MHz G4, the compile now completes without error !!' hooray! marking this bug closed ;)