Bug 6854 - Optimizations, profiling
Summary: Optimizations, profiling
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.4 SVN branch
Hardware: All All
: P2 enhancement
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-23 19:11 UTC by Mark Martinec
Modified: 2013-01-17 00:23 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
The low-hanging fruit application/octet-stream None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Martinec 2012-10-23 19:11:41 UTC
Created attachment 5102 [details]
The low-hanging fruit

Spent a day with a NYTProf 4.08 Perl profiler trying to cut down
some of the inefficiencies of SpamAssassin dealing with large mail
messages (which are usually large thanks to some Base64-encoded
attachments). Using Perl 5.16 on a FreeBSD 9.1 platform.

Picking just the low-hanging fruit with most outstanding hotspots in
each iteration, I managed to shave off about 100 ms of CPU-intensive
hotspots (local tests only) in a command-line spamassassin run
(with a 3 MB message containing a large PDF).

Depending on what is being measured (like aggregate mail throughput),
and what proportion of large messages are being passed to SpamAssassin
(like passing only the first 420 kB from amavisd to SpamAssassin),
this amounts to between 3 and 7 % of a speedup for large messages.
Not too bad where every bit adds up.
Comment 1 Mark Martinec 2012-10-23 19:13:25 UTC
trunk:
$ svn ci -m 'Bug 6854: Optimizations, profiling'      
  Sending lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending lib/Mail/SpamAssassin/Message.pm
  Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
  Sending lib/Mail/SpamAssassin/Plugin/VBounce.pm
  Sending lib/Mail/SpamAssassin/Util.pm
Committed revision 1401393.
Comment 2 AXB 2012-10-23 19:38:46 UTC
(In reply to comment #0)
> Created attachment 5102 [details]
> The low-hanging fruit
> 
> Spent a day with a NYTProf 4.08 Perl profiler trying to cut down
> some of the inefficiencies of SpamAssassin dealing with large mail
> messages (which are usually large thanks to some Base64-encoded
> attachments). Using Perl 5.16 on a FreeBSD 9.1 platform.
> 
> Picking just the low-hanging fruit with most outstanding hotspots in
> each iteration, I managed to shave off about 100 ms of CPU-intensive
> hotspots (local tests only) in a command-line spamassassin run
> (with a 3 MB message containing a large PDF).
> 
> Depending on what is being measured (like aggregate mail throughput),
> and what proportion of large messages are being passed to SpamAssassin
> (like passing only the first 420 kB from amavisd to SpamAssassin),
> this amounts to between 3 and 7 % of a speedup for large messages.
> Not too bad where every bit adds up.

I'm not sure I understand:
Does Amavisd send chuncks of raw message to SA instead of only the txt/html parts and leave "attachments" unscanned?
Comment 3 Mark Martinec 2012-10-23 19:48:15 UTC
(> I'm not sure I understand:
> Does Amavisd send chuncks of raw message to SA instead of only the txt/html
> parts and leave "attachments" unscanned?

A message larger than a certain configured size is truncated
at the configured size and that is what SpamAssassin sees.
No other contents processing in this data path, just
blunt truncation of the raw mail message. Works quite well,
certainly much better than not scanning large messages at all.
Comment 4 Mark Martinec 2013-01-17 00:23:38 UTC
Ok, that's it for now, more profiling some time in the future...