SA Bugzilla – Bug 7819
bayes is using usernames case-sensitive
Last modified: 2024-01-22 01:15:24 UTC
so foo is not same as fOo, since email is case insensitive it can create 2 bayes learning on same user
Domain names are not case-sensitive, it's optional for the local part of an address.
is the problem only postgresql then ?
could this gets highter priortet ? while its fixed in fuglu, its not yet in spamd/spamc
I'm a bit confused by the "Component" change. If this is a Bayes issue as stated in the bug title, the "Component" should stay "Plugins" because that is how Bayes is implemented. The only way I can see that "spamc/spamd" would be correct is if you're not referring to Bayes at all, but to the per-user configuration (including Bayes DB) support in spamc/spamd. In either case, this is NOT A BUG but rather an enhancement request and I think it should be optional and non-default, because usernames (and virtual usernames) can be case-sensitive. (In reply to Benny Pedersen from comment #2) >is the problem only postgresql then ? I would think that if you are using a RDBMS you could fix this on the DB side by making the relevant field case-insensitive. In MySQL that's the default, in PostgreSQL it requires that the column type be CITEXT rather than TEXT. See https://www.postgresql.org/docs/current/citext.html for details. (In reply to Benny Pedersen from comment #3) > could this gets highter priortet ? I do not expect so. It would require substantial effort to "fix" and the behavior is a non-bug. As RW says, local parts can be case-sensitive (with the exception of "postmaster") so it isn't formally wrong to treat 'fOo' and 'Foo' as different tokens. In fact it would be formally *wrong* to arbitrarily case-squash tokens just because they happen to be usernames. Not having examined the code for Bayes tokenization I cannot be certain, but I would expect that detection of usernames per se is not done. It would break an assumption of the "Naive Bayes" model of using simple classifiers that are stripped of context. Patches welcome, of course. In my opinion, adding the behavior change for Bayes local-part tokens or per-user config usernames should be optional and non-default.
Created attachment 5711 [details] Untested patch to have case insensitive usernames Untested patch to have case insensitive usernames on Postgresql as well.
thanks for the patch for postgresql, same should be maked to mysql, ldap, berkdb, sqlite ? if it was done in spamassasin core it would not be need to be database specifik
Created attachment 5732 [details] Insert lowered usernames Different patch that inserts usernames into bayes as lowercase. SELECT statements kept untouched because I do not know compatibility across different databases of LOWER() SQL statement.
is solved in trunk ?, gentoo still not have 4.x.x