SA Bugzilla – Bug 6956
Bayes/Redis/Lua buglet and speedup
Last modified: 2013-07-25 08:11:47 UTC
While reviewing code in BayesStore/Redis.pm I noticed one bug and one speedup opportunity there. The bug is in multi_hincrby Lua code, which test for s and h being nonzero in order to skip some unnecessary redis calls: if s ~= 0 then The problem is in the fact that all arguments are passed from redis to Lua as strings (not numbers), but the relational operators in Lua do not perform type coertion, so any string is always different from a numeric zero. Luckily no real harm is done, it's just that redundant calls to increment counts by a zero are not skipped, wasting some (little) time. The jackpot optimization opportunity lurks in the Lua procedure used by tok_get_all. Remembering the inefficient implementation of parsing multivalued result in a CPAN module Redis, changing the result as returned by the multi_hmget_script Lua code from a multivalued result (one spam/ham pair for each bayes token) into a single string of concatenated pairs, to be later split in perl on spaces, offers a nice boost. On our production server, comparing a median time to execute tok_get_all for an hour's worth of mail, before and after the change, dropped the median time from 10 ms to 6 ms. A noteworthy speedup for such a seemingly insignificant change in passing back results. Taking into account that Lua virtual machine is a register-based machine (and checking the compiled code), also took the opportunity to use local variables where they make sense, as accessing them (e.g. in a loop) is cheaper than accessing globals. Also avoided one unnecessary call to keys() in Plugin/Bayes.pm and avoided unnecessary entering/exiting a block in a map() call there. Probably insignificant, but anyway...
Created attachment 5157 [details] the Redis/Lua patch
trunk: Bug 6956: Bayes/Redis/Lua buglet and speedup Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm Committed revision 1501225.
(In reply to Mark Martinec from comment #2) > trunk: > Bug 6956: Bayes/Redis/Lua buglet and speedup > Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm > Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm > Committed revision 1501225. Thanks Marc - will schedule update
Works fine and fast, closing. Btw, tested also with Redis 2.8-rc, no surprises there. Just a redis server restart was needed, it picked the existing database from its disk copy and resumed work. Same speed with 2.8 as with 2.6.14 (or maybe a trifle faster or am I imagining, no statistical significance - possibly due to a fresh server start). For fun I re-enabled our PostgreSQL bayes backend and compared timings. The tok_get_all (i.e. lookup on all tokens) takes about 6 ms on average with Redis+Lua, and SQL shows two timing clusters, one around 10 ms, the other around 28 ms. The tok_touch_all (i.e. updating atime / expiration time of significant tokens, not autolearning) shows much higher difference: redis+Lua takes about 1 ms, while an SQL update takes about 20 ms for an average update.