Bug 6854

Summary: Optimizations, profiling
Product: Spamassassin Reporter: Mark Martinec <Mark.Martinec>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: 3.4 SVN branch   
Target Milestone: 3.4.0   
Hardware: All   
OS: All   
Whiteboard:
Attachments: The low-hanging fruit

Description Mark Martinec 2012-10-23 19:11:41 UTC
Created attachment 5102 [details]
The low-hanging fruit

Spent a day with a NYTProf 4.08 Perl profiler trying to cut down
some of the inefficiencies of SpamAssassin dealing with large mail
messages (which are usually large thanks to some Base64-encoded
attachments). Using Perl 5.16 on a FreeBSD 9.1 platform.

Picking just the low-hanging fruit with most outstanding hotspots in
each iteration, I managed to shave off about 100 ms of CPU-intensive
hotspots (local tests only) in a command-line spamassassin run
(with a 3 MB message containing a large PDF).

Depending on what is being measured (like aggregate mail throughput),
and what proportion of large messages are being passed to SpamAssassin
(like passing only the first 420 kB from amavisd to SpamAssassin),
this amounts to between 3 and 7 % of a speedup for large messages.
Not too bad where every bit adds up.
Comment 1 Mark Martinec 2012-10-23 19:13:25 UTC
trunk:
$ svn ci -m 'Bug 6854: Optimizations, profiling'      
  Sending lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending lib/Mail/SpamAssassin/Message.pm
  Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
  Sending lib/Mail/SpamAssassin/Plugin/VBounce.pm
  Sending lib/Mail/SpamAssassin/Util.pm
Committed revision 1401393.
Comment 2 AXB 2012-10-23 19:38:46 UTC
(In reply to comment #0)
> Created attachment 5102 [details]
> The low-hanging fruit
> 
> Spent a day with a NYTProf 4.08 Perl profiler trying to cut down
> some of the inefficiencies of SpamAssassin dealing with large mail
> messages (which are usually large thanks to some Base64-encoded
> attachments). Using Perl 5.16 on a FreeBSD 9.1 platform.
> 
> Picking just the low-hanging fruit with most outstanding hotspots in
> each iteration, I managed to shave off about 100 ms of CPU-intensive
> hotspots (local tests only) in a command-line spamassassin run
> (with a 3 MB message containing a large PDF).
> 
> Depending on what is being measured (like aggregate mail throughput),
> and what proportion of large messages are being passed to SpamAssassin
> (like passing only the first 420 kB from amavisd to SpamAssassin),
> this amounts to between 3 and 7 % of a speedup for large messages.
> Not too bad where every bit adds up.

I'm not sure I understand:
Does Amavisd send chuncks of raw message to SA instead of only the txt/html parts and leave "attachments" unscanned?
Comment 3 Mark Martinec 2012-10-23 19:48:15 UTC
(> I'm not sure I understand:
> Does Amavisd send chuncks of raw message to SA instead of only the txt/html
> parts and leave "attachments" unscanned?

A message larger than a certain configured size is truncated
at the configured size and that is what SpamAssassin sees.
No other contents processing in this data path, just
blunt truncation of the raw mail message. Works quite well,
certainly much better than not scanning large messages at all.
Comment 4 Mark Martinec 2013-01-17 00:23:38 UTC
Ok, that's it for now, more profiling some time in the future...