|
SA Bugzilla – Full Text Bug Listing |
Summary: | BayesStore/Redis.pm: please add support for per user database | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Marcin M <issues.apache.org> |
Component: | Learner | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | NEW --- | ||
Severity: | enhancement | CC: | apache, billcole, issues.apache.org, kmcgrail |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | Future | ||
Hardware: | PC | ||
OS: | Linux | ||
Whiteboard: |
Description
Marcin M
2015-01-13 16:45:49 UTC
Because of the way redis works as a hash lookup, would that be one database with an additional key value for username or are you envisioning tons of separate Redis DBs? You may want to look into MySQL with the memcached engine. I prefer to not use mysql:) May I expect that Redis storage will support per user bayes in ... eg. this year? Or I should rather choose other backend? Kevin, if you are asking me I can say that I don't know the Redis enough to answer. Also I'm not sure if Redis with can work with e.g. 5k databases. Personally I prefer to have all in one database byt maybe I'm still thinking in SQL way?:) Can you afford to keep in memory all bayes token sets and seen sets for each of your users? Is it worth the cost of memory? How long will it take for a redis server to reload or make a periodic dump? Good suggestion Mark. $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 639 0 non-token data: nspam 0.000 0 7577 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 0 0 non-token data: oldest atime 0.000 0 0 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count and redis has 169688kB on RSS. So it gives: 169688 kB/(639+7577) tokens 20.65 kB per token. It's rather high value. (In reply to Marcin M from comment #5) > and redis has 169688kB on RSS. So it gives: > 169688 kB/(639+7577) tokens > 20.65 kB per token. It's rather high value. nspam/nham is learned message counts, not tokens. $ redis-cli info | egrep '(_rss|db0:keys)' used_memory_rss:21688320 db0:keys=198733,expires=198729,avg_ttl=2079826604 ... about 109 bytes per token (key) This is just few hundred messages a day with ~1 month token ttl. So perhaps 10-20MB memory per user could be some low ballpark. |