SA Bugzilla – Bug 5082
Multi nested mime parts with large attatchments
Last modified: 2006-09-08 11:07:57 UTC
Our system got DoS'ed today (by accident) with some messages that had dozens of nested mime parts and large attatchments included. On a further analisys, the problem was spotted on the $mail->find_parts() sub routine and it happens before any check is done on the message. This causes a message of 6Mb in size, with dozens of nested multi part, to consume up to 1Gb of RAM on a single SA thread. I'll be attacthing a sample file to test this, but with the attatchments removed. (on each filename=foo part add attatchment data). To reproduce, either run it through spamassassin or simply as: use Mail::SpamAssassin; my $sa=Mail::SpamAssassin->new(); $sa->compile_now(1,undef); open(T,$ARGV[0]); my $buf; while(<T>) { $buf.=$_; } close T; my $mail=$sa->parse($buf); foreach my $p ($mail->find_parts(qr/foo/)) { }
Created attachment 3685 [details] Sample message Replace "PUT SOME DATA HERE" with real attatchment data.
Closed the barn door after the horse ran off (again) before reading any further... We really ought to see about fixing whatever defaults in Bugzilla makes it easy to create a Security bug that is public.
(In reply to comment #0) > On a further analisys, the problem was spotted on the $mail->find_parts() sub > routine and it happens before any check is done on the message. > > This causes a message of 6Mb in size, with dozens of nested multi part, to > consume up to 1Gb of RAM on a single SA thread. A few things: 1) a 6MB message is *way* larger than the recommended max message scan size of 250KB 2) the problem is not find_parts() which doesn't actually store a lot of data (array of mime part pointers) and in your example wouldn't store anything. the problem is that the way you're calling things, find_parts() causes the message to parsed, ie: generate the message tree from the input message. Having SA use less memory is something we always work to do, but scanning a message 24x the recommended max size is a bad idea. BTW: since the problem is the message size, which is a well known and documented piece of information, I don't consider this ticket a "security" issue.
oh, it may be worth noting that in a quick test w/ a ~6MB input file, SA 3.1 maxes out ~950MB, SA 3.2 maxes out ~400MB.
> BTW: since the problem is the message size, which is a well known > and documented piece of information, I don't consider this ticket > a "security" issue. I agree. I set the flag to restrict it as soon as I saw there was a new security category bug that was public, before even reading the details carefully. If nobody objects I'll make it public again and remove the security category. Actually I would close this as a WONTFIX, especially since we have improved memory utilisation processing MIME parts for 3.2. If nobody responds otherwise or closes it themselves within the next bunch of hours, I'll go ahead with that.
(In reply to comment #3) > (In reply to comment #0) > > On a further analisys, the problem was spotted on the $mail->find_parts() sub > > routine and it happens before any check is done on the message. > > > > This causes a message of 6Mb in size, with dozens of nested multi part, to > > consume up to 1Gb of RAM on a single SA thread. > > A few things: > > 1) a 6MB message is *way* larger than the recommended max message scan size of 250KB Yes, I am aware of that. But i'm only calling scan() if message is not greater than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to deal with attatchments before any real spamchecks are made. > 2) the problem is not find_parts() which doesn't actually store a lot of data > (array of mime part pointers) and in your example wouldn't store anything. the > problem is that the way you're calling things, find_parts() causes the message > to parsed, ie: generate the message tree from the input message. Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems too? I'm asking that because I used find_parts() based on the code within AntiVirus.pm If that's the case, can you confirm that using SA for message parsing (even if not check()ing ) is a bad idea for large messages? I think I'm not the only one using this method. I belive other spam check software SA based does the same (maybe amavis, mailscanner and others?). > > Having SA use less memory is something we always work to do, but scanning a > message 24x the recommended max size is a bad idea. > I understand and agree. But I still think that a 6mb message causing 1Gb RAM usage is pretty weird. Specially when that happens only on multi nested mime part messages (wich seams to indicate a bug somewere) > BTW: since the problem is the message size, which is a well known and documented > piece of information, I don't consider this ticket a "security" issue. Ok. I filled in as security related because that was requested if the issue was a denial of service and others. Anyway, it still is a security issue, at least in my code. Can you provide some pointers on how to do something like find_parts() without this behaviour? Thanks for the prompt reply.
(In reply to comment #6) > Yes, I am aware of that. But i'm only calling scan() if message is not greater > than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to > deal with attatchments before any real spamchecks are made. Aha. The memory usage comes from the internal message format via parse, and not the scan -- the scan can take a lot of time, depending on the input, but the parse is what generates the internal tree which is the space suck. > > 2) the problem is not find_parts() which doesn't actually store a lot of data > > Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems > too? I'm asking that because I used find_parts() based on the code within > AntiVirus.pm Of course. > If that's the case, can you confirm that using SA for message parsing (even if > not check()ing ) is a bad idea for large messages? I think I'm not the only one > using this method. I belive other spam check software SA based does the same > (maybe amavis, mailscanner and others?). Yes, the suggestion is specifically to not send to SA any message over 250k. check() is part of SA, but it's not the only part. I believe the other tools bypass SA completely based on the size, which would also bypass the parse step. > Anyway, it still is a security issue, at least in my code. Can you provide some > pointers on how to do something like find_parts() without this behaviour? You can't get that with SA. find_parts() needs to go through the internal message data structure, which is generated by the parse. So you can't do one if you don't want to do the other. The only thing you can do w/out parse is look at the message headers.
I did some more tests and found out why this happens. It seams that foreach mime part SA buffers all part contents. So if you have a 100Kb attatchment and 200 nested parts, you'll end up with 200*100kb memory usage. Something like: part1 part2 part3 (...) message content (...) part3 part2 part1 I'm guessing that this behaviour can also cause problems with more stuff (bayes tokens? heavy body rules?), because SA will parse each part separetly. Just a guess. Anyway, I fully understand your point and will try to use something else to do the mime parsing before even calling the SA object. Thanks. (In reply to comment #7) > (In reply to comment #6) > > Yes, I am aware of that. But i'm only calling scan() if message is not greater > > than 128Kb. The problem is that I'm using parse() and then $msg->find_parts() to > > deal with attatchments before any real spamchecks are made. > > Aha. The memory usage comes from the internal message format via parse, and not > the scan -- the scan can take a lot of time, depending on the input, but the > parse is what generates the internal tree which is the space suck. > > > > 2) the problem is not find_parts() which doesn't actually store a lot of data > > > > Does that mean that Plugin::AntiVirus and Plugin::MIMEHeader may cause problems > > too? I'm asking that because I used find_parts() based on the code within > > AntiVirus.pm > > Of course. > > > If that's the case, can you confirm that using SA for message parsing (even if > > not check()ing ) is a bad idea for large messages? I think I'm not the only one > > using this method. I belive other spam check software SA based does the same > > (maybe amavis, mailscanner and others?). > > Yes, the suggestion is specifically to not send to SA any message over 250k. > check() is part of SA, but it's not the only part. I believe the other tools > bypass SA completely based on the size, which would also bypass the parse step. > > > Anyway, it still is a security issue, at least in my code. Can you provide some > > pointers on how to do something like find_parts() without this behaviour? > > You can't get that with SA. find_parts() needs to go through the internal > message data structure, which is generated by the parse. So you can't do one if > you don't want to do the other. The only thing you can do w/out parse is look > at the message headers.
Not a security issue.
I think we've all agreed this isn't a bug, so closing as wfm. :)