Bug 4794 - is_charset_ok_for_locales() may be too generic
Summary: is_charset_ok_for_locales() may be too generic
Status: RESOLVED DUPLICATE of bug 4078
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: 3.1.0
Hardware: All Linux
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-14 18:51 UTC by Philip Prindeville
Modified: 2007-02-17 02:45 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Prindeville 2006-02-14 18:51:47 UTC
I've configured:

ok_locales en fr

(or even just "en") and I notice that messages written in Turkish, Cyrillic,
Greek, etc. all get through just fine even though my locales are English or
English and French.  Apparently the sieve for language tests is too granular.

I'm thinking that in "en", the rule that should apply is the following:

* the USASCII charset is fine;

* all 7-bit characters are fine;

* the 8-bit characters in ISO8859-1 should be fine (if we want to be extra liberal);

* the non-accented characters in ISO8859-[2-4] should be fine (section,
non-breaking space, etc);

And either anything else should fail the test, or else a small percentage (like
less than 0.5%) of accented characters from these "border line" character sets
should pass but anything more fail (since someone might send a message in
English, but write their name or signature in Greek or Russian or whatever).
Comment 1 Sidney Markowitz 2007-02-17 02:45:17 UTC
This RFE would be taken care of by fixing bug 4078, so closing as a dupe


*** This bug has been marked as a duplicate of 4078 ***