Bug 6435 - locale decimal point not used by SA for required_hits
Summary: locale decimal point not used by SA for required_hits
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.3.1
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-14 07:47 UTC by Nico Prenzel
Modified: 2010-05-26 09:59 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Nico Prenzel 2010-05-14 07:47:43 UTC
Hello Devs,

my SA install used in conjunction with MySQL for the userpref store, does not use the locale settings. Especially, the locale decimal point doesn't get payed tribute.

My MySQL backend table does contain the follwoing entry:
 Nico Prenzel/pn-systeme	required_hits	5,4	1520110

but a simple test with my user 
 /usr/bin/spamc -R -d 192.168.253.5 --username "Nico Prenzel/pn-systeme" < sample-spam.txt

does result in a required score of 5.0:

1002.5/5.0
Spam detection software, running on the system "dema1m040.bb.bbmsg", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  This is the GTUBE, the Generic Test for Unsolicited Bulk Email
   If your spam filter supports it, the GTUBE provides a test by which you can
   verify that the filter is installed correctly and is detecting incoming spam.
   You can send yourself a test mail containing the following string of characters
   (in upper case and with no white spaces and line breaks): [...]

Content analysis details:   (1002.5 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email
 2.5 BAYES_60               BODY: Bayes spam probability is 60 to 80%
                            [score: 0.7682]
-0.0 NO_RECEIVED            Informational: message has no Received headers





If i do change my userpref to the following
 Nico Prenzel/pn-systeme	required_hits	5.4	1520110

then my test outputs a needed score of 5.4:
1002.5/5.4
Spam detection software, running on the system "dema1m040.bb.bbmsg", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  This is the GTUBE, the Generic Test for Unsolicited Bulk Email
   If your spam filter supports it, the GTUBE provides a test by which you can
   verify that the filter is installed correctly and is detecting incoming spam.
   You can send yourself a test mail containing the following string of characte                                  rs
   (in upper case and with no white spaces and line breaks): [...]

Content analysis details:   (1002.5 points, 5.4 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email
 2.5 BAYES_60               BODY: Bayes spam probability is 60 to 80%
                            [score: 0.7682]
-0.0 NO_RECEIVED            Informational: message has no Received headers


system locale:
$ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point
decimal_point=","
mon_decimal_point=","

perl's locale:
#!/usr/bin/perl

use POSIX qw(locale_h);

# Get a reference to a hash of locale-dependent info
$locale_values = localeconv();

# Output sorted list of the values
for (sort keys %$locale_values) {
    printf "%-20s = %s\n", $_, $locale_values->{$_}
}

decimal_point        = .


Is this the intended behaviour, or a misconfigured perl?
Is SA always using the "minimum C locale"?


Thanks.

NicoP.
Comment 1 Nico Prenzel 2010-05-26 04:50:04 UTC
Any comments?

Pherhaps this could also be in 3.3.2?


NicoP.
Comment 2 Karsten Bräckelmann 2010-05-26 07:27:37 UTC
(In reply to comment #0)
> system locale:
> $ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point

Setting LANG here might influence the result, in particular if LC_NUMERIC is unset. What does a plain 'locale' return for LC_NUMERIC, LC_ALL and LANG?

Also, any chance your spamd init script sets or changes locale settings? Likewise, any specific locale settings for MySQL?
Comment 3 Nico Prenzel 2010-05-26 09:02:05 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > system locale:
> > $ LANG=de_DE.UTF-8 locale -k LC_NUMERIC LC_MONETARY | grep decimal_point
> Setting LANG here might influence the result, in particular if LC_NUMERIC is
> unset.
I've pasted the wrong command here.
The following 'locale' is the originally one.
> What does a plain 'locale' return for LC_NUMERIC, LC_ALL and LANG?
~# locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=


> Also, any chance your spamd init script sets or changes locale settings?
I've searched after this but I've not found any locale depended settings. I do use the init scripts provided with debian.

> Likewise, any specific locale settings for MySQL?
I don't think the locale setting could influence the behaviour as the MySQL column is defined as text. But this depends on where the text is converted to a number, I think.
My SpamAsssassin's local.cf also list the following select statement. So, I think the required_hits (here the value contains the number) is treated as text and is then converted by perl to a floating point number:
user_scores_sql_custom_query     SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '@GLOBAL' ORDER BY username ASC

Any more hints?

Thanks
NicoP.
Comment 4 Karsten Bräckelmann 2010-05-26 09:45:20 UTC
> I don't think the locale setting could influence the behaviour as the MySQL
> column is defined as text. But this depends on where the text is converted to a
> number, I think.

Ah, I thought it was a floating point number, not text.

Anyway, according to the docs, there is very little locali[sz]ation in SA (see that section), and I don't recall any hint that required_score (and then score...) would allow anything but C locale. So I guess this is intended behavior (as per your original question).

Moreover, your report template is in English, so your spamd does not appear to be running in a German locale anyway.
Comment 5 Karsten Bräckelmann 2010-05-26 09:59:36 UTC
Thinking about it... It would be a support nightmare to honor localized decimal point. This would require distributing localized cf files for scores.

WONTFIX, IMHO.