6060 – Perl5.8.9 crashes while compiling long code from generic rules

Bug 6060 - Perl5.8.9 crashes while compiling long code from generic rules

Summary: Perl5.8.9 crashes while compiling long code from generic rules

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Libraries (show other bugs)
Version:	SVN Trunk (Latest Devel Version)
Hardware:	Other All

Importance:	P5 normal
Target Milestone:	3.3.0
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-02-06 05:21 UTC by Mark Martinec
Modified:	2009-04-03 09:06 UTC (History)
CC List:	0 users

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Martinec 2009-02-06 05:21:10 UTC

After upgrading perl to 5.8.9 (FreeBSD, from ports), the interpreter
crashes when it tries to compile (eval) code autogenerated from rules
as soon as some module with threading like BerkeleyDB is brought in
(which apparently switches from a large normal stack to a smaller
per-thread stack size under FreeBSD).

This can be demonstrated by writing to a file what is intended to
be 'eval'-ed, then compiling such autogenerated code standalone:

perl -MBerkeleyDB body_0.pm
Bus error: 10 (core dumped)

I filed a perl bug report #63054:

  http://rt.perl.org/rt3/Public/Bug/Display.html?id=63054
  Perl_peep recursion exceeds stack in 5.8.9 eval-ing a long program

but I'm not yet sure how to proceed. Looks like I need to
downgrade to 5.8.8 or upgrade to 5.10.0, which is a bit of a
pain when most Perl modules are installed from FreeBSD ports.

A workaround could be if SA produced smaller code sections,
the 83600 lines of code for body rules at prio=0 is a lot
(using some SARE and ZDF rules), perl5.8.9 can survive about
two thirds of that.

Here are line counts (wc -l) of code autogenerated from rules,
when written to a file:

      19 meta_899.pm
      57 full_500.pm
     112 uri_500.pm
     255 head_500.pm
     256 rawbody_500.pm
     343 full_0.pm
    2251 meta_500.pm
    4720 uri_0.pm
    6912 rawbody_0.pm
   24340 head_0.pm
   83600 body_0.pm

Compiling the head_0.pm is still below a stack size, the body_0.pm
is above it and causes a Bus error.

Btw, I experimented with changes based on Bug 5876 (storing code to a
file and letting 'do' deal with it). It is promising, it does indeed
reduce memory footprint, although is does not provide a solution
for this 5.8.9 problem. Perhaps a solution could take both aspects
into considerations (I have available fresh patches along the lines
of Bug 5876, separate for eval and for generic rules, I can make them
available).

Comment 1 Justin Mason 2009-02-06 06:15:34 UTC

> Btw, I experimented with changes based on Bug 5876 (storing code to a
> file and letting 'do' deal with it). It is promising, it does indeed
> reduce memory footprint, although is does not provide a solution
> for this 5.8.9 problem. Perhaps a solution could take both aspects
> into considerations (I have available fresh patches along the lines
> of Bug 5876, separate for eval and for generic rules, I can make them
> available).

+1 to that idea!

btw it should be no problem to split up the large subs.  even if each one was broken into 100 separate subroutines, I don't think there would be a noticeable performance hit.  It's analogous to loop unrolling, what's going on here, basically.

Comment 2 Mark Martinec 2009-02-06 09:40:29 UTC

> btw it should be no problem to split up the large subs.  even if each one was
> broken into 100 separate subroutines, I don't think there would be a noticeable
> performance hit.  It's analogous to loop unrolling, what's going on here,
> basically.

There is a complication in Plugin::Check involving code segments being
accumulated into evalstr2, which (because of the order of insertion into
the final result) for a file-based approach seems to require the use of
two files and a final copying to one.

On a closer examination the evalstr2 is currently never used,
because all code additions to evalstr2 (through add_temporary_method)
seem to be prefixed by a test for $self->{main}->{use_rule_subs},
which is currently always false, as the statement in PerMsgStatus
is commented out:

  #$self->{main}->{use_rule_subs} = 1;

Was this just an experiment which didn't work out and can be removed,
or can it be useful and should be preserved?

Comment 3 Justin Mason 2009-02-06 09:43:51 UTC

use_rule_subs is very useful for profiling, as it exposes each rule as an individual subroutine.  I'm surprised it's not still available...
It's possible there's a mass-check switch that turns it on.

Comment 4 Mark Martinec 2009-02-06 11:00:41 UTC

> use_rule_subs is very useful for profiling, as it exposes each rule as an
> individual subroutine.

Here is a count of calls to got_hit in the eval-ed code on our installation,
by type and priority (as given as arguments to eval in the current code):

full_500.pm      3
uri_500.pm       6
rawbody_500.pm  15
head_500.pm     18
full_0.pm       25
uri_0.pm       294
rawbody_0.pm   431
meta_500.pm    736
head_0.pm     1855
body_0.pm     5224

Now that I think of it, a combined approach similar to use_rule_subs
would satisfy both needs: not give Perl too large chunks of code
to compile, and save on memory footprint in a parent process
(inherited by child processes) for source code, as Bug 5876 is
trying to solve.

But instead of one rule per subroutine (as with use_rule_subs)
or all rules in one subroutine (as without use_rule_subs),
perhaps 100 rules per sub would be a good compromise, cleanly
satisfying both needs, and without going into complications
with temporary files. (btw, only generic rules are of issue
here, the run_eval_tests is not problematic).

Justin (as you are more intimately familiar with this code),
can you think of a change which could achieve this?

Comment 5 Mark Martinec 2009-03-31 12:02:14 UTC

So far I survived by avoiding Perl 5.8.9 crashes (which was the
current version in FreeBSD ports), and was running SpamAssassin under
a hand-compiled 5.10.0.

Few days ago the FreeBSD ports was updated to perl 5.10.0, and after
upgrading, this same old 'Bus error' reared its ugly head again. So
for the last two days I had to disable any use of berkeley db to survive.

Seems the difference between my hand-compiled Perl 5.10.0 and the
official one is that mine used 32-bit integers, while the one from ports
used 64-bit integers, which causes the program to occupy somewhat more
memory, and the compilation hits the stack limit sooner.

So it seems the much heavier stack usage by a Perl compiler happened
between 5.8.8 and 5.8.9, and the 5.10.0 is no better than 5.8.9
in this respect, as I initially thought.

> Here is a count of calls to got_hit in the eval-ed code on our installation,
> by type and priority (as given as arguments to eval in the current code):
> [...]
> rawbody_0.pm   431
> meta_500.pm    736
> head_0.pm     1855
> body_0.pm     5224

I repeated the exercise with body_0.pm, prunning it down to a size
which the compiler still managed to compile without a crash, the
limit is:
  body_0.pm     3268
rules.

> Now that I think of it, a combined approach similar to use_rule_subs
> would satisfy both needs: not give Perl too large chunks of code
> to compile, and save on memory footprint in a parent process
> (inherited by child processes) for source code, as Bug 5876 is
> trying to solve.
> 
> But instead of one rule per subroutine (as with use_rule_subs)
> or all rules in one subroutine (as without use_rule_subs),
> perhaps 100 rules per sub would be a good compromise, cleanly
> satisfying both needs, and without going into complications
> with temporary files.

Not having much of a choice, I embarked on modifying the Check.pm
plugin to implement the above idea: compile not more than about 60kB
of source code at a time, and if necessary produce multiple subroutines,
one for each chunk of code, and provide a master subroutine which calls
each of the chunk subroutines in turn.

A side effect is a noticable reduction in memory footprint.
I modified the 'spamassassin' command to sleep just before finishing,
and checked the process memory size by a 'ps' command. The set of rules
is what I normally use on a production mailer (updates.spamassassin.org,
sought.rules.yerp.org, some SARE rules and a handful of local ones).

original 3.3 trunk:
   VSZ   RSS
 99692 91752

3.3 trunk with my modified Check.pm:
 82080 79584

The reduction is 17.2 MB in virtual memory size,
and 11.9 MB of resident memory size.  

As an experiment I also eliminated compiling of eval rules (substituted
by direct calls), as it seems little point in optimizing the outer loop.
I keep this diffs separate. It yields some additional memory reduction:
  80780 78468
i.e. 1.3 MB VSZ and 1.1MB in RSS. It is not much, but makes code
much simpler. I doubt there is any noticable performance penalty
by not optimizing an outer loop, it is hard to measure with all the
timing noise.

Anyway, I'd like to bring in at least my chunking changes in Check.pm.
It does pass the tests, and it does produce identical results on a
couple of messages that I tried manually with old and new code (each
with many and varied hits), and it does make our production mailer+SA
run again with berkeley db enabled under 5.10.0.

I would appreciate a second opinion on the approach and code, and any
feedback in case I broke some corner case which I didn't try (user rules?,
use_rule_subs?, mass checks?).

  Bug 6060: let the Check.pm plugin produce smaller chunks
  of source code (60 kB) to avoid Perl compiler crashing
  on exceeding stack size, and to reduce memory footprint
  of SpamAssassin.
  Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
  Committed revision 760568.

I can also commit the other half (direct execution of eval rules)
- it can be reverted later if it doesn't feel right.

Comment 6 Justin Mason 2009-03-31 13:17:59 UTC

great memory reduction figures!

I'd like to get a benchmark of  the speed of mass-check on a 50% ham/50% spam corpus before and after.  that's the best way to judge speed changes on this code, I've found.

Comment 7 Justin Mason 2009-03-31 14:28:14 UTC

(In reply to comment #6)
> great memory reduction figures!
> 
> I'd like to get a benchmark of  the speed of mass-check on a 50% ham/50% spam
> corpus before and after.  that's the best way to judge speed changes on this
> code, I've found.

ok, I've benched it --

pre:
real    0m33.159s user    0m32.586s sys     0m0.392s
real    0m31.770s user    0m31.282s sys     0m0.460s
real    0m32.189s user    0m31.462s sys     0m0.548s
peak memory usage:
Tue Mar 31 21:26:51 2009: 0  1000 16007 14415  20   0  63956 59896 -      R+   pts/6      0:30 /usr/bin/perl -w ./mass-check -n -o ham:mbox:/home/jm/ftp/tstcor/ham.mbox spam:mbox:/home/jm/ftp/tstcor/spam.mbox

post:
real    0m31.101s user    0m30.774s sys     0m0.320s
real    0m31.150s user    0m30.850s sys     0m0.300s
real    0m32.166s user    0m31.710s sys     0m0.300s
peak memory usage:
Tue Mar 31 21:23:35 2009: 0  1000 15424 14415  20   0  58428 54356 -      R+   pts/6      0:29 /usr/bin/perl -w ./mass-check -n -o ham:mbox:/home/jm/ftp/tstcor/ham.mbox spam:mbox:/home/jm/ftp/tstcor/spam.mbox


It's an improvement across the board there. +1

can you attach the other changes as patches and I'll test them too?

Comment 8 Mark Martinec 2009-03-31 15:55:41 UTC

> ok, I've benched it --

Thanks!
 
> It's an improvement across the board there. +1

Whatever speedup there was must have been due to different
memory organization of compiled code, I can't see other reason.

> can you attach the other changes as patches and I'll test them too?

Coming. Appreciated.

  Bug6060, part 2: replace compiled eval rules with direct execution
  Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
  Transmitting file data ..
  Committed revision 760705.

Comment 9 Justin Mason 2009-04-01 03:01:56 UTC

> Whatever speedup there was must have been due to different
> memory organization of compiled code, I can't see other reason.

yeah.  I have a feeling that as CPUs have gotten so fast compared to
the memory pipeline, the effects of memory usage reduction become 
more important for execution speed.

> > can you attach the other changes as patches and I'll test them too?

they don't look so great:

BEFORE
time ./mass-check -n -o ham:mbox:/home/jmason/ftp/tstcor/ham.mbox  spam:mbox:/home/jmason/ftp/tstcor/spam.mbox > /dev/null
real    0m32.587s user    0m32.166s sys     0m0.284s
real    0m32.711s user    0m32.266s sys     0m0.264s
real    0m32.416s user    0m32.002s sys     0m0.340s
real    0m32.450s user    0m32.094s sys     0m0.276s

real   avg=33.956s min=33.537s max=34.401s count=5
user   avg=33.304s min=32.986s max=33.834s count=5
sys    avg=0.282s min=0.236s max=0.312s count=5


AFTER
time ./mass-check -n -o ham:mbox:/home/jmason/ftp/tstcor/ham.mbox  spam:mbox:/home/jmason/ftp/tstcor/spam.mbox > /dev/null
real    0m33.833s user    0m33.078s sys     0m0.308s
real    0m34.401s user    0m33.834s sys     0m0.296s
real    0m33.913s user    0m33.266s sys     0m0.312s
real    0m34.098s user    0m33.358s sys     0m0.236s

real   avg=32.501s min=32.269s max=32.711s count=6
user   avg=32.111s min=31.902s max=32.266s count=6
sys    avg=0.282s min=0.224s max=0.340s count=6

So that's 4.4% slower, it seems.  (now this wasn't on an unloaded machine, but
I think those numbers are mostly accurate.)

I'm afraid I'm -1 on that with that performance reduction; sorry!

Comment 10 Mark Martinec 2009-04-01 07:05:02 UTC

> > > can you attach the other changes as patches and I'll test them too?
> they don't look so great:
> So that's 4.4% slower, it seems.  (now this wasn't on an unloaded machine,
> but I think those numbers are mostly accurate.) 
> I'm afraid I'm -1 on that with that performance reduction; sorry!

Ok, reverting. A bit of a slowdown was expected, but I guess it doesn't
pay off for code reduction.

  reverting r760705, directly executed eval rules are about 4% slower
  (but saves about 1.2 MB), compared to compiled eval rules. Bug 6060
  Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
  Committed revision 760896.

Comment 11 Mark Martinec 2009-04-03 09:06:51 UTC

Ok, so I guess this is fixed now for 3.3, closing.
Please reopen if you notice any problems introduced by the change.