SA Bugzilla – Bug 4570
Mail with lots of To addresses in header triggers Bus error in Perl [CVE-2005-3351]
Last modified: 2005-11-08 19:59:02 UTC
I just came across a similar spam message (URL attached) with insanely lots of recipients listed in its header (1600+), and it causes 'spamassassin -t -D' to trigger Perl crash with a Bus error. This is the same bug as discussed on the SA mailing list these days (reported by Irina <irina <at> nas.net> on 2005-08-18, subject: "Too many recipients"). Unlike Irina, I run Perl 5.8.7, SA 3.1.0-rc2, and if it matters, this is a FreeBSD 5.4 (it may matter because Perl memory limits may be different than on a Linux system). The system is otherwise stable and is processsing regular company mail in quantities through this same SA/Perl. The tail of the output from "truss -f spamassassin -t -D <0.lis" is: [12661] dbg: plugin: Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x9533a60) implements 'autolearn_discriminator' 12661: write(2,0xa08f804,121) = 121 (0x79) [12661] dbg: learn: auto-learn: currently using scoreset 3, recomputing score based on scoreset 1 12661: write(2,0xa08f804,98) = 98 (0x62) [12661] dbg: learn: auto-learn: message score: 10.0202857142857, computed score for autolearn: 7.219 12661: write(2,0xa08f804,101) = 101 (0x65) [12661] dbg: learn: auto-learn? ham=0.1, spam=12, body-points=4.719, head-points=7.219, learned-points=3.5 12661: write(2,0xa08f804,107) = 107 (0x6b) [12661] dbg: learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam 12661: write(2,0xa08f804,93) = 93 (0x5d) 12661: unlink("/tmp/.spamassassin12661f3JCzttmp") = 0 (0x0) [12661] dbg: check: is spam? score=10.02 required=6.8 12661: write(2,0xa08f804,54) = 54 (0x36) [12661] dbg: check: tests=AWL,BAYES_99,DCC_CHECK,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST, HEAD_LONG,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY 12661: write(2,0xa08f804,137) = 137 (0x89) [12661] dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__ENV_AND_HDR_FROM_MATCH, __FROM_MSN_COM,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HIGHBITS, __MIME_QP,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HEX, __MSGID_OK_HOST,__NONEMPTY_BODY,__RFC_IGNORANT_ENVFROM,__SANE_MSGID, __SARE_BODY_BLNK_5_100,__SARE_HEAD_HDR_MIMEV,__SARE_HEAD_HDR_RCVD, __SARE_HEAD_MIME_VALID,__SARE_HTML_HAS_MSG,__SARE_HTML_HAS_TO, __SARE_META_MURTY3,__SARE_URI_ANY,__TOCC_EXISTS 12661: write(2,0xa08f804,446) = 446 (0x1be) 12661: __sysctl(0xbfbfe878,0x2,0xbfbfe8c0,0xbfbfe894,0x0,0x0) = 0 (0x0) SIGNAL 10 (SIGBUS) SIGNAL 10 (SIGBUS) Process stopped because of: 16 process exit, rval = 10 Bus error My wild guess is that it happens within some regular expression evaluation, having to deal with such a long folder header line. It is probably not a showstopper for 3.1.0, as I believe this has been seen also in previous versions. But it may get interesting if bad guys become aware of it. Mark
Here are the last dying breaths of when Perl debugger is enabled on "spamassassin -L -t -D <0.lis" : 370: return $self->{pristine_headers} unless $hdr; Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:371): 371: my(@ret) = $self->{pristine_headers} =~ /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig; Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:372): 372: if (@ret) { Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:373): 373: return wantarray ? @ret : $ret[-1]; Mail::SpamAssassin::PerMsgStatus::rewrite_report_safe( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/PerMsgStatus.pm:719): 719: my $to = $self->{msg}->get_pristine_header("To"); Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:368): 368: my ($self, $hdr) = @_; Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:370): 370: return $self->{pristine_headers} unless $hdr; Mail::SpamAssassin::Message::get_pristine_header( /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/Message.pm:371): 371: my(@ret) = $self->{pristine_headers} =~ /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig; Illegal instruction $
The problem is easy to reproduce in a minimalistic one-line Perl form, both on FreeBSD and on Linux (Perl v5.8.7 in all cases): Linux: $ perl -e '$h="To:"; $h.=" bla\@example.com\n" for (1..6020); @a=$h=~/^(?:To:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig' Segmentation fault FreeBSD: $ perl -e '$h="To:"; $h.=" bla\@example.com\n" for (1..571); @a=$h=~/^(?:To:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig' Bus error The number of addresses is the minimal that causes a Perl crash on a given system. If it doesn't go down on your system, increase the number. For some reason the number is much lower with a default Perl installation on FreeBSD than on Linux, but it sinks anyway. Looks like the regexp should better be rewritten to a more stable form. Mark
Subject: Re: Mail with lots of To addresses in header triggers Bus error in Perl BTW, there was another bus error perl bug reported a month or two back, I believe on BSD, that also had something to do with large something. I don't recall if it was large body, long URI, or many recipients. I htink that was eventually closed as WFM, but perhaps should be re-evaluated. This is definitely similar in many respects to that bug. Loren
it does sound a *lot* like that bug -- I believe there's some platform-specific issue on FreeBSD to do with pthreads: http://www.nntp.perl.org/group/perl.perl5.porters/103500 http://rt.perl.org/rt3/Ticket/Display.html?id=36667
> it does sound a *lot* like that bug -- I believe there's some > platform-specific issue on FreeBSD to do with pthreads:... This is part of the problem - it explains why the stack is smaller on a default Perl 5.8.7 installation on FreeBSD compared to Linux, but this doesn't address the core problem with the regular expression used to parse long (un)folded header line. Using nested subexpressions with * and + is evil and should be carefully contemplated. Improper use can easily cause amazingly slow backtracking or deep recursion potentially resulting in stack overflow. While it is debatable if Perl's inability to catch stack overflow is a bug or an implementational limitation, we have no choice but to live with it. Keeping the original test case in a further simplified form in mind: $ perl -e '$h.=" bla\n" for (1..6500); $h=~/^(\s+\S.*\n)*/' Segmentation fault (on Linux) please read the following section from perlretut man page: > Most of the time, all this moving forward and backtracking happens > quickly and searching is fast. There are some pathological regexps, > however, whose execution time exponentially grows with the size of the > string. A typical structure that blows up in your face is of the form > /(a|b+)*/; > The problem is the nested indeterminate quantifiers. There are many > different ways of partitioning a string of length n between the "+" and > "*": one repetition with "b+" of length n, two repetitions with the > first "b+" length k and the second with length n-k, m repetitions whose > bits add up to length n, etc. In fact there are an exponential number > of ways to partition a string as a function of length. A regexp may > get lucky and match early in the process, but if there is no match, > perl will try every possibility before giving up. So be careful with > nested "*"'s, "{n,m}"'s, and "+"'s. See also the section on "Independent subexpressions" in the same man page. Mark
changing milestone -- I agree this isn't a 3.1.0 blocker, but possibly for 3.1.1.
I propose the following change: old: /^(?:$hdr:[ \t]+(.*\n(?:\s+\S.*\n)*))/mig; new: /^\Q$hdr\E:[ \t]+(.*?\n(?![ \t]))/smgi; As far as I can tell it produces the same results, but extends the number of handled lines from 500 (or 6500 on Linux) to a million.
That does look a lot better. Not being much of a perl expert I didn't know about the negated lookahead operator (?! and I was working on splitting the headers into lines and scanning them in a loop. Can someone see how many lines the regexp can handle on a FreeBSD system that has the lower limit? If it can handle something that is long enough that it would be too long for SpamAssassin anyway (i.e., over 256k bytes or so) then I think we could go with the regexp instead of the much longer loop code.
thanks for the regexp -- it looks interesting. why does "smgi" use *both* s and m? which one wins?
Both s and m are needed for ^ and .* to each have the correct meanings in that expression. From man perlretut: "both s and m modifiers (//sm): Treat string as a single long line, but detect multiple lines. ’.’ matches any character, even "\n". "^" and "$", however, are able to match at the start or end of any line within the string." The ^ in the expression finds header strings at the beginning of lines within the multi-line string, but .* will match the embedded \n characters with the negated lookahead stoppping before the start of the next header.
> Can someone see how many lines the regexp can handle > on a FreeBSD system that has the lower limit? Almost 4 million lines (possibly more) on a FreeBSD 5.4 with a default Perl install of 5.8.7, and 30 million on a Linux. [talking about the following test case: perl -e '$hdr="to"; $h="To:"; $h.=" bla\@example.com\n" for (1..4000000/3); $h=$h.$h."From: bla to: me\n".$h; @a=$h=~/^\Q$hdr\E:[ \t]+(.*?\n(?![ \t]))/smgi; printf("%d\n",scalar(@a))' ] Btw, I'm assuming the $hdr is not supposed to be a regular expression, but a plain string, for which \Q...\E quoting is more appropriate.
Created attachment 3121 [details] New regexp that can handle large headers without segfaulting 4 million is enough for me. I think we could put this in to 3.1.0 since it is 1) a simple change; and 2) closes up a potential DoS vulnerability The \Q \E is not strictly necessary as a security measure because $hdr is set by the caller of the function, which has either hardcoded strings such as "To" or else is from the user's configuration file for the report_safe_headers option. However, it is a good idea to use \Q \E as $hdr is used as a plain string. Good catch. I have created a patch with the proposed regexp and I'm marking this for review for 3.1.0. Committers, please vote. I'll also check this into trunk.
Committed to trunk revision 279666. Ready for voting for 3.1.0 branch.
+1
Committed to branch 3.1 revision 279946
*** Bug 4171 has been marked as a duplicate of this bug. ***
Retargeting for 3.0.5.
Use identifier CVE-2005-3351 for this issue.
Created attachment 3226 [details] Mark's test case mail fwiw, here's a copy of Mark's test case mail from the URL he posted, as an attachment.
thanks for sorting the CVE for us, Mark. I'm not sure if it's possible to actually use this to cause a practical DoS, btw. it would be possible to get a message passed as nonspam (through scanner failure), but the scanner should recover the dead child process immediately for later scans; spamd is resilient in the face of the Mail::SpamAssassin classes blowing up.
Created attachment 3238 [details] spamassassin-3.0.4-4570-avoid-segfault-large-headers.patch Testing of this patch in the past month has been fine.
+1 Committed revision 331942.