Bug 6364 - Russian UTF-8 in TextCat
Summary: Russian UTF-8 in TextCat
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.3.0
Hardware: All All
: P5 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-04 10:17 UTC by Mike Stupalov
Modified: 2013-01-20 17:21 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Stupalov 2010-03-04 10:17:19 UTC
Please add support UTF-8 encoding for russian language in Textcat. Without it do not work valuably definition of Russian language and "normalize_charset".

For problem solution it is necessary to add a file ru.utf-8.lm in source.
This file called ru-utf8.lm, is accessible by link:
http://trac.greenstone.org/browser/main/trunk/greenstone2/perllib/textcat

(In the same place there are some refreshed files for other languages.)

P.S. This problem with russian language very old and unpleasant, I think all Russian-speaking community will tell for this small file of thanks :)
Comment 1 Mike Stupalov 2010-03-04 11:05:32 UTC
In the repository specified above added utf-8 textcat models for russian, french, spanish, italian and chinese.
Comment 2 Henrik Krohns 2010-03-04 11:13:54 UTC
Also notice my bug.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229

TextCat is currently broken case and encoding wise. It should be completely revamped.