Bug 4570 - Mail with lots of To addresses in header triggers Bus error in Perl [CVE-2005-3351]
Summary: Mail with lots of To addresses in header triggers Bus error in Perl [CVE-2005...
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.1.0
Hardware: Other All
: P3 normal
Target Milestone: 3.0.5
Assignee: SpamAssassin Developer Mailing List
URL: http://ezri.ijs.si/~mark/sa-perl-buse...
: 4171 (view as bug list)
Depends on:
Reported: 2005-09-05 16:21 UTC by Mark Martinec
Modified: 2005-11-08 19:59 UTC (History)
1 user (show)

Attachment Type Modified Status Actions Submitter/CLA Status
New regexp that can handle large headers without segfaulting patch None Sidney Markowitz [HasCLA]
Mark's test case mail text/plain None Justin Mason [HasCLA]
spamassassin-3.0.4-4570-avoid-segfault-large-headers.patch patch None Warren Togami [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Martinec 2005-09-05 16:21:25 UTC
I just came across a similar spam message (URL attached) with 
insanely lots of recipients listed in its header (1600+), and it 
causes 'spamassassin -t -D' to trigger Perl crash with a Bus error. 
This is the same bug as discussed on the SA mailing list these days 
(reported by Irina <irina <at> nas.net> on 2005-08-18, 
subject: "Too many recipients"). 
Unlike Irina, I run Perl 5.8.7, SA 3.1.0-rc2, and if it matters, this 
is a FreeBSD 5.4 (it may matter because Perl memory limits may be different 
than on a Linux system). The system is otherwise stable and is processsing 
regular company mail in quantities through this same SA/Perl. 
The tail of the output from "truss -f spamassassin -t -D <0.lis" is: 
[12661] dbg: plugin: 
  implements 'autolearn_discriminator' 
12661: write(2,0xa08f804,121)                    = 121 (0x79) 
[12661] dbg: learn: auto-learn: currently using scoreset 3, 
  recomputing score based on scoreset 1 
12661: write(2,0xa08f804,98)                     = 98 (0x62) 
[12661] dbg: learn: auto-learn: message score: 10.0202857142857, 
  computed score for autolearn: 7.219 
12661: write(2,0xa08f804,101)                    = 101 (0x65) 
[12661] dbg: learn: auto-learn? ham=0.1, spam=12, body-points=4.719, 
  head-points=7.219, learned-points=3.5 
12661: write(2,0xa08f804,107)                    = 107 (0x6b) 
[12661] dbg: learn: auto-learn? no: inside auto-learn thresholds, 
  not considered ham or spam 
12661: write(2,0xa08f804,93)                     = 93 (0x5d) 
12661: unlink("/tmp/.spamassassin12661f3JCzttmp") = 0 (0x0) 
[12661] dbg: check: is spam? score=10.02 required=6.8 
12661: write(2,0xa08f804,54)                     = 54 (0x36) 
[12661] dbg: check: 
12661: write(2,0xa08f804,137)                    = 137 (0x89) 
[12661] dbg: check: 
12661: write(2,0xa08f804,446)                    = 446 (0x1be) 
12661: __sysctl(0xbfbfe878,0x2,0xbfbfe8c0,0xbfbfe894,0x0,0x0) = 0 (0x0) 
Process stopped because of:  16 
process exit, rval = 10 
Bus error 
My wild guess is that it happens within some regular expression 
evaluation, having to deal with such a long folder header line. 
It is probably not a showstopper for 3.1.0, as I believe this has 
been seen also in previous versions. But it may get interesting 
if bad guys become aware of it. 
Comment 1 Mark Martinec 2005-09-05 17:41:47 UTC
Here are the last dying breaths of when Perl debugger is enabled 
on "spamassassin -L -t -D <0.lis" : 
370:      return $self->{pristine_headers} unless $hdr; 
371:      my(@ret) = $self->{pristine_headers} =~ 
  /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig; 
372:      if (@ret) { 
373:        return wantarray ? @ret : $ret[-1]; 
719:      my $to = $self->{msg}->get_pristine_header("To"); 
368:      my ($self, $hdr) = @_; 
370:      return $self->{pristine_headers} unless $hdr; 
371:      my(@ret) = $self->{pristine_headers} =~ 
  /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig; 
Illegal instruction 
Comment 2 Mark Martinec 2005-09-06 00:33:14 UTC
The problem is easy to reproduce in a minimalistic one-line Perl form, 
both on FreeBSD and on Linux (Perl v5.8.7 in all cases): 
$ perl -e '$h="To:"; $h.=" bla\@example.com\n" for (1..6020); 
  @a=$h=~/^(?:To:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig' 
Segmentation fault 
$ perl -e '$h="To:"; $h.=" bla\@example.com\n" for (1..571); 
  @a=$h=~/^(?:To:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig' 
Bus error 
The number of addresses is the minimal that causes a Perl crash on 
a given system. If it doesn't go down on your system, increase the number. 
For some reason the number is much lower with a default Perl installation 
on FreeBSD than on Linux, but it sinks anyway. 
Looks like the regexp should better be rewritten to a more stable form. 
Comment 3 Loren Wilton 2005-09-06 00:53:56 UTC
Subject: Re:  Mail with lots of To addresses in header triggers Bus error in Perl

BTW, there was another bus error perl bug reported a month or two back, I
believe on BSD, that also had something to do with large something.  I don't
recall if it was large body, long URI, or many recipients.  I htink that was
eventually closed as WFM, but perhaps should be re-evaluated.

This is definitely similar in many respects to that bug.


Comment 4 Justin Mason 2005-09-06 10:30:03 UTC
it does sound a *lot* like that bug -- I believe there's some platform-specific
issue on FreeBSD to do with pthreads:

Comment 5 Mark Martinec 2005-09-06 16:12:09 UTC
> it does sound a *lot* like that bug -- I believe there's some  
> platform-specific issue on FreeBSD to do with pthreads:...  
This is part of the problem - it explains why the stack is smaller  
on a default Perl 5.8.7 installation on FreeBSD compared to Linux,  
but this doesn't address the core problem with the regular expression 
used to parse long (un)folded header line.  
Using nested subexpressions with * and + is evil and should be  
carefully contemplated. Improper use can easily cause amazingly slow  
backtracking or deep recursion potentially resulting in stack overflow. 
While it is debatable if Perl's inability to catch stack overflow 
is a bug or an implementational limitation, we have no choice but 
to live with it.   
Keeping the original test case in a further simplified form in mind:  
  $ perl -e '$h.=" bla\n" for (1..6500); $h=~/^(\s+\S.*\n)*/'  
  Segmentation fault (on Linux)  
please read the following section from perlretut man page:  
> Most of the time, all this moving forward and backtracking happens  
> quickly and searching is fast.   There are some pathological regexps,  
> however, whose execution time exponentially grows with the size of the  
> string.  A typical structure that blows up in your face is of the form  
>     /(a|b+)*/;  
> The problem is the nested indeterminate quantifiers.  There are many  
> different ways of partitioning a string of length n between the "+" and  
> "*": one repetition with "b+" of length n, two repetitions with the  
> first "b+" length k and the second with length n-k, m repetitions whose  
> bits add up to length n, etc.  In fact there are an exponential number  
> of ways to partition a string as a function of length.  A regexp may  
> get lucky and match early in the process, but if there is no match,  
> perl will try every possibility before giving up.  So be careful with  
> nested "*"'s, "{n,m}"'s, and "+"'s. 
See also the section on "Independent subexpressions" in the same man page. 
Comment 6 Justin Mason 2005-09-07 18:27:44 UTC
changing milestone -- I agree this isn't a 3.1.0 blocker, but possibly for 3.1.1.
Comment 7 Mark Martinec 2005-09-08 02:14:22 UTC
I propose the following change: 
old:  /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig;   
new:  /^\Q$hdr\E:[ \t]+(.*?\n(?![ \t]))/smgi; 
As far as I can tell it produces the same results, but extends  
the number of handled lines from 500 (or 6500 on Linux) to a million.  
Comment 8 Sidney Markowitz 2005-09-08 12:19:33 UTC
That does look a lot better. Not being much of a perl expert I didn't know about
the negated lookahead operator (?! and I was working on splitting the headers
into lines and scanning them in a loop. Can someone see how many lines the
regexp can handle on a FreeBSD system that has the lower limit? If it can handle
something that is long enough that it would be too long for SpamAssassin anyway
(i.e., over 256k bytes or so) then I think we could go with the regexp instead
of the much longer loop code.
Comment 9 Justin Mason 2005-09-08 12:31:26 UTC
thanks for the regexp -- it looks interesting.
why does "smgi" use *both* s and m?  which one wins?
Comment 10 Sidney Markowitz 2005-09-08 12:57:36 UTC
Both s and m are needed for ^ and .* to each have the correct meanings in that
expression. From man perlretut:

"both s and m modifiers (//sm): Treat string as a single long line, but detect
multiple lines.  ’.’ matches any character, even "\n".  "^" and "$", however,
are able to match at the start or end of any line within the string."

The ^ in the expression finds header strings at the beginning of lines within
the multi-line string, but .* will match the embedded \n characters with the
negated lookahead stoppping before the start of the next header.
Comment 11 Mark Martinec 2005-09-08 15:09:44 UTC
> Can someone see how many lines the regexp can handle 
> on a FreeBSD system that has the lower limit? 
Almost 4 million lines (possibly more) on a FreeBSD 5.4 with a 
default Perl install of 5.8.7, and 30 million on a Linux. 
[talking about the following test case: 
  perl -e '$hdr="to"; $h="To:"; $h.=" bla\@example.com\n" for (1..4000000/3); 
  $h=$h.$h."From: bla to: me\n".$h; 
  @a=$h=~/^\Q$hdr\E:[ \t]+(.*?\n(?![ \t]))/smgi; printf("%d\n",scalar(@a))' 
Btw, I'm assuming the $hdr is not supposed to be a regular expression, 
but a plain string, for which \Q...\E quoting is more appropriate. 
Comment 12 Sidney Markowitz 2005-09-08 15:48:27 UTC
Created attachment 3121 [details]
New regexp that can handle large headers without segfaulting

4 million is enough for me. I think we could put this in to 3.1.0 since it is
1) a simple change; and 2) closes up a potential DoS vulnerability

The \Q \E is not strictly necessary as a security measure because $hdr is set
by the caller of the function, which has either hardcoded strings such as "To"
or else is from the user's configuration file for the report_safe_headers
option. However, it is a good idea to use \Q \E as $hdr is used as a plain
string. Good catch.

I have created a patch with the proposed regexp and I'm marking this for review
for 3.1.0. Committers, please vote.

I'll also check this into trunk.
Comment 13 Sidney Markowitz 2005-09-08 15:52:32 UTC
Committed to trunk revision 279666.

Ready for voting for 3.1.0 branch.
Comment 14 Justin Mason 2005-09-08 16:02:38 UTC
Comment 15 Daryl C. W. O'Shea 2005-09-09 21:17:05 UTC
Comment 16 Sidney Markowitz 2005-09-09 21:37:28 UTC
Committed to branch 3.1 revision 279946
Comment 17 Sidney Markowitz 2005-09-10 21:39:07 UTC
*** Bug 4171 has been marked as a duplicate of this bug. ***
Comment 18 Warren Togami 2005-09-11 03:00:04 UTC
Retargeting for 3.0.5.
Comment 19 Mark J Cox 2005-11-01 11:42:22 UTC
Use identifier CVE-2005-3351 for this issue.
Comment 20 Justin Mason 2005-11-01 21:36:29 UTC
Created attachment 3226 [details]
Mark's test case mail

fwiw, here's a copy of Mark's test case mail from the URL he posted, as an
Comment 21 Justin Mason 2005-11-01 21:45:40 UTC
thanks for sorting the CVE for us, Mark.

I'm not sure if it's possible to actually use this to cause a practical DoS,
btw.  it would be possible to get a message passed as nonspam (through scanner
failure), but the scanner should recover the dead child process immediately for
later scans; spamd is resilient in the face of the Mail::SpamAssassin classes
blowing up.
Comment 22 Warren Togami 2005-11-07 21:51:01 UTC
Created attachment 3238 [details]

Testing of this patch in the past month has been fine.
Comment 23 Justin Mason 2005-11-08 23:57:12 UTC
Comment 24 Daryl C. W. O'Shea 2005-11-09 03:56:39 UTC
Comment 25 Sidney Markowitz 2005-11-09 04:59:02 UTC

Committed revision 331942.