SA Bugzilla – Bug 6435
locale decimal point not used by SA for required_hits
Last modified: 2010-05-26 09:59:36 UTC
Hello Devs, my SA install used in conjunction with MySQL for the userpref store, does not use the locale settings. Especially, the locale decimal point doesn't get payed tribute. My MySQL backend table does contain the follwoing entry: Nico Prenzel/pn-systeme required_hits 5,4 1520110 but a simple test with my user /usr/bin/spamc -R -d 192.168.253.5 --username "Nico Prenzel/pn-systeme" < sample-spam.txt does result in a required score of 5.0: 1002.5/5.0 Spam detection software, running on the system "dema1m040.bb.bbmsg", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: This is the GTUBE, the Generic Test for Unsolicited Bulk Email If your spam filter supports it, the GTUBE provides a test by which you can verify that the filter is installed correctly and is detecting incoming spam. You can send yourself a test mail containing the following string of characters (in upper case and with no white spaces and line breaks): [...] Content analysis details: (1002.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 NO_RELAYS Informational: message was not relayed via SMTP 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email 2.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.7682] -0.0 NO_RECEIVED Informational: message has no Received headers If i do change my userpref to the following Nico Prenzel/pn-systeme required_hits 5.4 1520110 then my test outputs a needed score of 5.4: 1002.5/5.4 Spam detection software, running on the system "dema1m040.bb.bbmsg", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: This is the GTUBE, the Generic Test for Unsolicited Bulk Email If your spam filter supports it, the GTUBE provides a test by which you can verify that the filter is installed correctly and is detecting incoming spam. You can send yourself a test mail containing the following string of characte rs (in upper case and with no white spaces and line breaks): [...] Content analysis details: (1002.5 points, 5.4 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 NO_RELAYS Informational: message was not relayed via SMTP 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email 2.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.7682] -0.0 NO_RECEIVED Informational: message has no Received headers system locale: $ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point decimal_point="," mon_decimal_point="," perl's locale: #!/usr/bin/perl use POSIX qw(locale_h); # Get a reference to a hash of locale-dependent info $locale_values = localeconv(); # Output sorted list of the values for (sort keys %$locale_values) { printf "%-20s = %s\n", $_, $locale_values->{$_} } decimal_point = . Is this the intended behaviour, or a misconfigured perl? Is SA always using the "minimum C locale"? Thanks. NicoP.
Any comments? Pherhaps this could also be in 3.3.2? NicoP.
(In reply to comment #0) > system locale: > $ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point Setting LANG here might influence the result, in particular if LC_NUMERIC is unset. What does a plain 'locale' return for LC_NUMERIC, LC_ALL and LANG? Also, any chance your spamd init script sets or changes locale settings? Likewise, any specific locale settings for MySQL?
(In reply to comment #2) > (In reply to comment #0) > > system locale: > > $ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point > Setting LANG here might influence the result, in particular if LC_NUMERIC is > unset. I've pasted the wrong command here. The following 'locale' is the originally one. > What does a plain 'locale' return for LC_NUMERIC, LC_ALL and LANG? ~# locale LANG=de_DE.UTF-8 LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL= > Also, any chance your spamd init script sets or changes locale settings? I've searched after this but I've not found any locale depended settings. I do use the init scripts provided with debian. > Likewise, any specific locale settings for MySQL? I don't think the locale setting could influence the behaviour as the MySQL column is defined as text. But this depends on where the text is converted to a number, I think. My SpamAsssassin's local.cf also list the following select statement. So, I think the required_hits (here the value contains the number) is treated as text and is then converted by perl to a floating point number: user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '@GLOBAL' ORDER BY username ASC Any more hints? Thanks NicoP.
> I don't think the locale setting could influence the behaviour as the MySQL > column is defined as text. But this depends on where the text is converted to a > number, I think. Ah, I thought it was a floating point number, not text. Anyway, according to the docs, there is very little locali[sz]ation in SA (see that section), and I don't recall any hint that required_score (and then score...) would allow anything but C locale. So I guess this is intended behavior (as per your original question). Moreover, your report template is in English, so your spamd does not appear to be running in a German locale anyway.
Thinking about it... It would be a support nightmare to honor localized decimal point. This would require distributing localized cf files for scores. WONTFIX, IMHO.