SA Bugzilla – Bug 6060
Perl5.8.9 crashes while compiling long code from generic rules
Last modified: 2009-04-03 09:06:51 UTC
After upgrading perl to 5.8.9 (FreeBSD, from ports), the interpreter crashes when it tries to compile (eval) code autogenerated from rules as soon as some module with threading like BerkeleyDB is brought in (which apparently switches from a large normal stack to a smaller per-thread stack size under FreeBSD). This can be demonstrated by writing to a file what is intended to be 'eval'-ed, then compiling such autogenerated code standalone: perl -MBerkeleyDB body_0.pm Bus error: 10 (core dumped) I filed a perl bug report #63054: http://rt.perl.org/rt3/Public/Bug/Display.html?id=63054 Perl_peep recursion exceeds stack in 5.8.9 eval-ing a long program but I'm not yet sure how to proceed. Looks like I need to downgrade to 5.8.8 or upgrade to 5.10.0, which is a bit of a pain when most Perl modules are installed from FreeBSD ports. A workaround could be if SA produced smaller code sections, the 83600 lines of code for body rules at prio=0 is a lot (using some SARE and ZDF rules), perl5.8.9 can survive about two thirds of that. Here are line counts (wc -l) of code autogenerated from rules, when written to a file: 19 meta_899.pm 57 full_500.pm 112 uri_500.pm 255 head_500.pm 256 rawbody_500.pm 343 full_0.pm 2251 meta_500.pm 4720 uri_0.pm 6912 rawbody_0.pm 24340 head_0.pm 83600 body_0.pm Compiling the head_0.pm is still below a stack size, the body_0.pm is above it and causes a Bus error. Btw, I experimented with changes based on Bug 5876 (storing code to a file and letting 'do' deal with it). It is promising, it does indeed reduce memory footprint, although is does not provide a solution for this 5.8.9 problem. Perhaps a solution could take both aspects into considerations (I have available fresh patches along the lines of Bug 5876, separate for eval and for generic rules, I can make them available).
> Btw, I experimented with changes based on Bug 5876 (storing code to a > file and letting 'do' deal with it). It is promising, it does indeed > reduce memory footprint, although is does not provide a solution > for this 5.8.9 problem. Perhaps a solution could take both aspects > into considerations (I have available fresh patches along the lines > of Bug 5876, separate for eval and for generic rules, I can make them > available). +1 to that idea! btw it should be no problem to split up the large subs. even if each one was broken into 100 separate subroutines, I don't think there would be a noticeable performance hit. It's analogous to loop unrolling, what's going on here, basically.
> btw it should be no problem to split up the large subs. even if each one was > broken into 100 separate subroutines, I don't think there would be a noticeable > performance hit. It's analogous to loop unrolling, what's going on here, > basically. There is a complication in Plugin::Check involving code segments being accumulated into evalstr2, which (because of the order of insertion into the final result) for a file-based approach seems to require the use of two files and a final copying to one. On a closer examination the evalstr2 is currently never used, because all code additions to evalstr2 (through add_temporary_method) seem to be prefixed by a test for $self->{main}->{use_rule_subs}, which is currently always false, as the statement in PerMsgStatus is commented out: #$self->{main}->{use_rule_subs} = 1; Was this just an experiment which didn't work out and can be removed, or can it be useful and should be preserved?
use_rule_subs is very useful for profiling, as it exposes each rule as an individual subroutine. I'm surprised it's not still available... It's possible there's a mass-check switch that turns it on.
> use_rule_subs is very useful for profiling, as it exposes each rule as an > individual subroutine. Here is a count of calls to got_hit in the eval-ed code on our installation, by type and priority (as given as arguments to eval in the current code): full_500.pm 3 uri_500.pm 6 rawbody_500.pm 15 head_500.pm 18 full_0.pm 25 uri_0.pm 294 rawbody_0.pm 431 meta_500.pm 736 head_0.pm 1855 body_0.pm 5224 Now that I think of it, a combined approach similar to use_rule_subs would satisfy both needs: not give Perl too large chunks of code to compile, and save on memory footprint in a parent process (inherited by child processes) for source code, as Bug 5876 is trying to solve. But instead of one rule per subroutine (as with use_rule_subs) or all rules in one subroutine (as without use_rule_subs), perhaps 100 rules per sub would be a good compromise, cleanly satisfying both needs, and without going into complications with temporary files. (btw, only generic rules are of issue here, the run_eval_tests is not problematic). Justin (as you are more intimately familiar with this code), can you think of a change which could achieve this?
So far I survived by avoiding Perl 5.8.9 crashes (which was the current version in FreeBSD ports), and was running SpamAssassin under a hand-compiled 5.10.0. Few days ago the FreeBSD ports was updated to perl 5.10.0, and after upgrading, this same old 'Bus error' reared its ugly head again. So for the last two days I had to disable any use of berkeley db to survive. Seems the difference between my hand-compiled Perl 5.10.0 and the official one is that mine used 32-bit integers, while the one from ports used 64-bit integers, which causes the program to occupy somewhat more memory, and the compilation hits the stack limit sooner. So it seems the much heavier stack usage by a Perl compiler happened between 5.8.8 and 5.8.9, and the 5.10.0 is no better than 5.8.9 in this respect, as I initially thought. > Here is a count of calls to got_hit in the eval-ed code on our installation, > by type and priority (as given as arguments to eval in the current code): > [...] > rawbody_0.pm 431 > meta_500.pm 736 > head_0.pm 1855 > body_0.pm 5224 I repeated the exercise with body_0.pm, prunning it down to a size which the compiler still managed to compile without a crash, the limit is: body_0.pm 3268 rules. > Now that I think of it, a combined approach similar to use_rule_subs > would satisfy both needs: not give Perl too large chunks of code > to compile, and save on memory footprint in a parent process > (inherited by child processes) for source code, as Bug 5876 is > trying to solve. > > But instead of one rule per subroutine (as with use_rule_subs) > or all rules in one subroutine (as without use_rule_subs), > perhaps 100 rules per sub would be a good compromise, cleanly > satisfying both needs, and without going into complications > with temporary files. Not having much of a choice, I embarked on modifying the Check.pm plugin to implement the above idea: compile not more than about 60kB of source code at a time, and if necessary produce multiple subroutines, one for each chunk of code, and provide a master subroutine which calls each of the chunk subroutines in turn. A side effect is a noticable reduction in memory footprint. I modified the 'spamassassin' command to sleep just before finishing, and checked the process memory size by a 'ps' command. The set of rules is what I normally use on a production mailer (updates.spamassassin.org, sought.rules.yerp.org, some SARE rules and a handful of local ones). original 3.3 trunk: VSZ RSS 99692 91752 3.3 trunk with my modified Check.pm: 82080 79584 The reduction is 17.2 MB in virtual memory size, and 11.9 MB of resident memory size. As an experiment I also eliminated compiling of eval rules (substituted by direct calls), as it seems little point in optimizing the outer loop. I keep this diffs separate. It yields some additional memory reduction: 80780 78468 i.e. 1.3 MB VSZ and 1.1MB in RSS. It is not much, but makes code much simpler. I doubt there is any noticable performance penalty by not optimizing an outer loop, it is hard to measure with all the timing noise. Anyway, I'd like to bring in at least my chunking changes in Check.pm. It does pass the tests, and it does produce identical results on a couple of messages that I tried manually with old and new code (each with many and varied hits), and it does make our production mailer+SA run again with berkeley db enabled under 5.10.0. I would appreciate a second opinion on the approach and code, and any feedback in case I broke some corner case which I didn't try (user rules?, use_rule_subs?, mass checks?). Bug 6060: let the Check.pm plugin produce smaller chunks of source code (60 kB) to avoid Perl compiler crashing on exceeding stack size, and to reduce memory footprint of SpamAssassin. Sending lib/Mail/SpamAssassin/Plugin/Check.pm Committed revision 760568. I can also commit the other half (direct execution of eval rules) - it can be reverted later if it doesn't feel right.
great memory reduction figures! I'd like to get a benchmark of the speed of mass-check on a 50% ham/50% spam corpus before and after. that's the best way to judge speed changes on this code, I've found.
(In reply to comment #6) > great memory reduction figures! > > I'd like to get a benchmark of the speed of mass-check on a 50% ham/50% spam > corpus before and after. that's the best way to judge speed changes on this > code, I've found. ok, I've benched it -- pre: real 0m33.159s user 0m32.586s sys 0m0.392s real 0m31.770s user 0m31.282s sys 0m0.460s real 0m32.189s user 0m31.462s sys 0m0.548s peak memory usage: Tue Mar 31 21:26:51 2009: 0 1000 16007 14415 20 0 63956 59896 - R+ pts/6 0:30 /usr/bin/perl -w ./mass-check -n -o ham:mbox:/home/jm/ftp/tstcor/ham.mbox spam:mbox:/home/jm/ftp/tstcor/spam.mbox post: real 0m31.101s user 0m30.774s sys 0m0.320s real 0m31.150s user 0m30.850s sys 0m0.300s real 0m32.166s user 0m31.710s sys 0m0.300s peak memory usage: Tue Mar 31 21:23:35 2009: 0 1000 15424 14415 20 0 58428 54356 - R+ pts/6 0:29 /usr/bin/perl -w ./mass-check -n -o ham:mbox:/home/jm/ftp/tstcor/ham.mbox spam:mbox:/home/jm/ftp/tstcor/spam.mbox It's an improvement across the board there. +1 can you attach the other changes as patches and I'll test them too?
> ok, I've benched it -- Thanks! > It's an improvement across the board there. +1 Whatever speedup there was must have been due to different memory organization of compiled code, I can't see other reason. > can you attach the other changes as patches and I'll test them too? Coming. Appreciated. Bug6060, part 2: replace compiled eval rules with direct execution Sending lib/Mail/SpamAssassin/Conf/Parser.pm Sending lib/Mail/SpamAssassin/Plugin/Check.pm Transmitting file data .. Committed revision 760705.
> Whatever speedup there was must have been due to different > memory organization of compiled code, I can't see other reason. yeah. I have a feeling that as CPUs have gotten so fast compared to the memory pipeline, the effects of memory usage reduction become more important for execution speed. > > can you attach the other changes as patches and I'll test them too? they don't look so great: BEFORE time ./mass-check -n -o ham:mbox:/home/jmason/ftp/tstcor/ham.mbox spam:mbox:/home/jmason/ftp/tstcor/spam.mbox > /dev/null real 0m32.587s user 0m32.166s sys 0m0.284s real 0m32.711s user 0m32.266s sys 0m0.264s real 0m32.416s user 0m32.002s sys 0m0.340s real 0m32.450s user 0m32.094s sys 0m0.276s real avg=33.956s min=33.537s max=34.401s count=5 user avg=33.304s min=32.986s max=33.834s count=5 sys avg=0.282s min=0.236s max=0.312s count=5 AFTER time ./mass-check -n -o ham:mbox:/home/jmason/ftp/tstcor/ham.mbox spam:mbox:/home/jmason/ftp/tstcor/spam.mbox > /dev/null real 0m33.833s user 0m33.078s sys 0m0.308s real 0m34.401s user 0m33.834s sys 0m0.296s real 0m33.913s user 0m33.266s sys 0m0.312s real 0m34.098s user 0m33.358s sys 0m0.236s real avg=32.501s min=32.269s max=32.711s count=6 user avg=32.111s min=31.902s max=32.266s count=6 sys avg=0.282s min=0.224s max=0.340s count=6 So that's 4.4% slower, it seems. (now this wasn't on an unloaded machine, but I think those numbers are mostly accurate.) I'm afraid I'm -1 on that with that performance reduction; sorry!
> > > can you attach the other changes as patches and I'll test them too? > they don't look so great: > So that's 4.4% slower, it seems. (now this wasn't on an unloaded machine, > but I think those numbers are mostly accurate.) > I'm afraid I'm -1 on that with that performance reduction; sorry! Ok, reverting. A bit of a slowdown was expected, but I guess it doesn't pay off for code reduction. reverting r760705, directly executed eval rules are about 4% slower (but saves about 1.2 MB), compared to compiled eval rules. Bug 6060 Sending lib/Mail/SpamAssassin/Conf/Parser.pm Sending lib/Mail/SpamAssassin/Plugin/Check.pm Committed revision 760896.
Ok, so I guess this is fixed now for 3.3, closing. Please reopen if you notice any problems introduced by the change.