Bug 6956 - Bayes/Redis/Lua buglet and speedup
Summary: Bayes/Redis/Lua buglet and speedup
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 minor
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-09 12:34 UTC by Mark Martinec
Modified: 2013-07-25 08:11 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
the Redis/Lua patch patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Martinec 2013-07-09 12:34:28 UTC
While reviewing code in BayesStore/Redis.pm I noticed one bug
and one speedup opportunity there.

The bug is in multi_hincrby Lua code, which test for s and h
being nonzero in order to skip some unnecessary redis calls:
  if s ~= 0 then

The problem is in the fact that all arguments are passed from
redis to Lua as strings (not numbers), but the relational
operators in Lua do not perform type coertion, so any string
is always different from a numeric zero.

Luckily no real harm is done, it's just that redundant calls
to increment counts by a zero are not skipped, wasting some
(little) time.


The jackpot optimization opportunity lurks in the Lua
procedure used by tok_get_all. Remembering the inefficient
implementation of parsing multivalued result in a CPAN module
Redis, changing the result as returned by the multi_hmget_script
Lua code from a multivalued result (one spam/ham pair for
each bayes token) into a single string of concatenated pairs,
to be later split in perl on spaces, offers a nice boost.

On our production server, comparing a median time to execute
tok_get_all for an hour's worth of mail, before and after
the change, dropped the median time from 10 ms to 6 ms.
A noteworthy speedup for such a seemingly insignificant change
in passing back results.

Taking into account that Lua virtual machine is a register-based
machine (and checking the compiled code), also took the opportunity
to use local variables where they make sense, as accessing them
(e.g. in a loop) is cheaper than accessing globals.

Also avoided one unnecessary call to keys() in Plugin/Bayes.pm
and avoided unnecessary entering/exiting a block in a map() call
there. Probably insignificant, but anyway...
Comment 1 Mark Martinec 2013-07-09 12:35:45 UTC
Created attachment 5157 [details]
the Redis/Lua patch
Comment 2 Mark Martinec 2013-07-09 12:41:44 UTC
trunk:
Bug 6956: Bayes/Redis/Lua buglet and speedup
  Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
  Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
Committed revision 1501225.
Comment 3 AXB 2013-07-09 12:44:11 UTC
(In reply to Mark Martinec from comment #2)
> trunk:
> Bug 6956: Bayes/Redis/Lua buglet and speedup
>   Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
>   Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
> Committed revision 1501225.

Thanks Marc - will schedule update
Comment 4 Mark Martinec 2013-07-25 08:11:47 UTC
Works fine and fast, closing.

Btw, tested also with Redis 2.8-rc, no surprises there.
Just a redis server restart was needed, it picked the existing
database from its disk copy and resumed work.

Same speed with 2.8 as with 2.6.14 (or maybe a trifle faster
or am I imagining, no statistical significance - possibly due to
a fresh server start).

For fun I re-enabled our PostgreSQL bayes backend and compared timings.

The tok_get_all (i.e. lookup on all tokens) takes about 6 ms
on average with Redis+Lua, and SQL shows two timing clusters,
one around 10 ms, the other around 28 ms.

The tok_touch_all (i.e. updating atime / expiration time of
significant tokens, not autolearning) shows much higher difference:
redis+Lua takes about 1 ms, while an SQL update takes about 20 ms
for an average update.