Bug 5082 - Multi nested mime parts with large attatchments
Summary: Multi nested mime parts with large attatchments
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Security (show other bugs)
Version: 3.1.3
Hardware: PC Linux
: P2 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-04 19:43 UTC by Joao Gouveia
Modified: 2006-09-08 11:07 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Sample message text/plain None Joao Gouveia [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Joao Gouveia 2006-09-04 19:43:51 UTC
Our system got DoS'ed today (by accident) with some messages that had dozens of
nested mime parts and large attatchments included.
On a further analisys, the problem was spotted on the $mail->find_parts() sub
routine and it happens before any check is done on the message.

This causes a message of 6Mb in size, with dozens of nested multi part, to
consume up to 1Gb of RAM on a single SA thread.

I'll be attacthing a sample file to test this, but with the attatchments removed.
(on each filename=foo part add attatchment data).

To reproduce, either run it through spamassassin or simply as:

use Mail::SpamAssassin;
my $sa=Mail::SpamAssassin->new();
$sa->compile_now(1,undef);
open(T,$ARGV[0]);
my $buf;
while(<T>) {
        $buf.=$_;
}
close T;

my $mail=$sa->parse($buf);
foreach my $p ($mail->find_parts(qr/foo/)) {
}
Comment 1 Joao Gouveia 2006-09-04 19:47:50 UTC
Created attachment 3685 [details]
Sample message

Replace "PUT SOME DATA HERE" with real attatchment data.
Comment 2 Sidney Markowitz 2006-09-04 20:05:39 UTC
Closed the barn door after the horse ran off (again) before reading any further...

We really ought to see about fixing whatever defaults in Bugzilla makes it easy
to create a Security bug that is public.
Comment 3 Theo Van Dinter 2006-09-04 20:15:05 UTC
(In reply to comment #0)
> On a further analisys, the problem was spotted on the $mail->find_parts() sub
> routine and it happens before any check is done on the message.
> 
> This causes a message of 6Mb in size, with dozens of nested multi part, to
> consume up to 1Gb of RAM on a single SA thread.

A few things:

1) a 6MB message is *way* larger than the recommended max message scan size of 250KB

2) the problem is not find_parts() which doesn't actually store a lot of data
(array of mime part pointers) and in your example wouldn't store anything.  the
problem is that the way you're calling things, find_parts() causes the message
to parsed, ie: generate the message tree from the input message.


Having SA use less memory is something we always work to do, but scanning a
message 24x the recommended max size is a bad idea.


BTW: since the problem is the message size, which is a well known and documented
piece of information, I don't consider this ticket a "security" issue.
Comment 4 Theo Van Dinter 2006-09-04 20:22:43 UTC
oh, it may be worth noting that in a quick test w/ a ~6MB input file, SA 3.1
maxes out ~950MB, SA 3.2 maxes out ~400MB.
Comment 5 Sidney Markowitz 2006-09-04 22:40:38 UTC
> BTW: since the problem is the message size, which is a well known
> and documented piece of information, I don't consider this ticket
> a "security" issue.

I agree. I set the flag to restrict it as soon as I saw there was a new security
category bug that was public, before even reading the details carefully. If
nobody objects I'll make it public again and remove the security category.
Actually I would close this as a WONTFIX, especially since we have improved
memory utilisation processing MIME parts for 3.2.

If nobody responds otherwise or closes it themselves within the next bunch of
hours, I'll go ahead with that.
Comment 6 Joao Gouveia 2006-09-04 22:48:49 UTC
(In reply to comment #3)
> (In reply to comment #0)
> > On a further analisys, the problem was spotted on the $mail->find_parts() sub
> > routine and it happens before any check is done on the message.
> > 
> > This causes a message of 6Mb in size, with dozens of nested multi part, to
> > consume up to 1Gb of RAM on a single SA thread.
> 
> A few things:
> 
> 1) a 6MB message is *way* larger than the recommended max message scan size of
250KB

Yes, I am aware of that. But i'm only calling scan() if message is not greater
than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to
deal with attatchments before any real spamchecks are made.
 
> 2) the problem is not find_parts() which doesn't actually store a lot of data
> (array of mime part pointers) and in your example wouldn't store anything.  the
> problem is that the way you're calling things, find_parts() causes the message
> to parsed, ie: generate the message tree from the input message.

Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems
too? I'm asking that because I used find_parts() based on the code within
AntiVirus.pm

If that's the case, can you confirm that using SA for message parsing (even if
not check()ing ) is a bad idea for large messages? I think I'm not the only one
using this method. I belive other spam check software SA based does the same
(maybe amavis, mailscanner and others?).

> 
> Having SA use less memory is something we always work to do, but scanning a
> message 24x the recommended max size is a bad idea.
> 

I understand and agree. But I still think that a 6mb message causing 1Gb RAM
usage is pretty weird. Specially when that happens only on multi nested mime
part messages (wich seams to indicate a bug somewere)
 
> BTW: since the problem is the message size, which is a well known and documented
> piece of information, I don't consider this ticket a "security" issue.

Ok. I filled in as security related because that was requested if the issue was
a denial of service and others.
Anyway, it still is a security issue, at least in my code. Can you provide some
pointers on how to do something like find_parts() without this behaviour?

Thanks for the prompt reply.
Comment 7 Theo Van Dinter 2006-09-05 00:32:07 UTC
(In reply to comment #6)
> Yes, I am aware of that. But i'm only calling scan() if message is not greater
> than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to
> deal with attatchments before any real spamchecks are made.

Aha.  The memory usage comes from the internal message format via parse, and not
the scan -- the scan can take a lot of time, depending on the input, but the
parse is what generates the internal tree which is the space suck.

> > 2) the problem is not find_parts() which doesn't actually store a lot of data
> 
> Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems
> too? I'm asking that because I used find_parts() based on the code within
> AntiVirus.pm

Of course.

> If that's the case, can you confirm that using SA for message parsing (even if
> not check()ing ) is a bad idea for large messages? I think I'm not the only one
> using this method. I belive other spam check software SA based does the same
> (maybe amavis, mailscanner and others?).

Yes, the suggestion is specifically to not send to SA any message over 250k. 
check() is part of SA, but it's not the only part.  I believe the other tools
bypass SA completely based on the size, which would also bypass the parse step.

> Anyway, it still is a security issue, at least in my code. Can you provide some
> pointers on how to do something like find_parts() without this behaviour?

You can't get that with SA.  find_parts() needs to go through the internal
message data structure, which is generated by the parse.  So you can't do one if
you don't want to do the other.  The only thing you can do w/out parse is look
at the message headers.
Comment 8 Joao Gouveia 2006-09-05 00:43:35 UTC
I did some more tests and found out why this happens.
It seams that foreach mime part SA buffers all part contents. So if you have a
100Kb attatchment and 200 nested parts, you'll end up with 200*100kb memory usage.
Something like:
part1
 part2
  part3
   (...)
   message content
   (...)
  part3
 part2
part1

I'm guessing that this behaviour can also cause problems with more stuff (bayes
tokens? heavy body rules?), because SA will parse each part separetly. Just a guess.

Anyway, I fully understand your point and will try to use something else to do
the mime parsing before even calling the SA object.

Thanks.

(In reply to comment #7)
> (In reply to comment #6)
> > Yes, I am aware of that. But i'm only calling scan() if message is not greater
> > than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to
> > deal with attatchments before any real spamchecks are made.
> 
> Aha.  The memory usage comes from the internal message format via parse, and not
> the scan -- the scan can take a lot of time, depending on the input, but the
> parse is what generates the internal tree which is the space suck.
> 
> > > 2) the problem is not find_parts() which doesn't actually store a lot of data
> > 
> > Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems
> > too? I'm asking that because I used find_parts() based on the code within
> > AntiVirus.pm
> 
> Of course.
> 
> > If that's the case, can you confirm that using SA for message parsing (even if
> > not check()ing ) is a bad idea for large messages? I think I'm not the only one
> > using this method. I belive other spam check software SA based does the same
> > (maybe amavis, mailscanner and others?).
> 
> Yes, the suggestion is specifically to not send to SA any message over 250k. 
> check() is part of SA, but it's not the only part.  I believe the other tools
> bypass SA completely based on the size, which would also bypass the parse step.
> 
> > Anyway, it still is a security issue, at least in my code. Can you provide some
> > pointers on how to do something like find_parts() without this behaviour?
> 
> You can't get that with SA.  find_parts() needs to go through the internal
> message data structure, which is generated by the parse.  So you can't do one if
> you don't want to do the other.  The only thing you can do w/out parse is look
> at the message headers.

Comment 9 Theo Van Dinter 2006-09-08 18:06:47 UTC
Not a security issue.
Comment 10 Theo Van Dinter 2006-09-08 18:07:57 UTC
I think we've all agreed this isn't a bug, so closing as wfm. :)