Bug 5640 - UTF8 is missing from CHARSETS_LIKELY_TO_FP_AS_CAPS
Summary: UTF8 is missing from CHARSETS_LIKELY_TO_FP_AS_CAPS
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.2.3
Hardware: Other FreeBSD
: P5 minor
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-04 00:09 UTC by Oleg Gawriloff
Modified: 2019-07-30 18:01 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Gawriloff 2007-09-04 00:09:33 UTC
We have following subject:
Subject: =?utf-8?B?0JDQutGCIOKEliAwMDAwMDA5OTA1INC+0YIgMDEuMDguMjAwNw==?=
 =?utf-8?B?INC00LvRjyAi0JfQkNCeINCR0LXQu9Cw0J/QkNCdIiDQvtGCINCQ0YLQu9Cw0L3RgiDQ
 =?utf-8?B?0LXQu9C10LrQvtC8?=

It's legal base64 utf8 subject, but due missing utf-8 in Constants.pm:
use constant CHARSETS_LIKELY_TO_FP_AS_CAPS => qr{[-_a-z0-9]*(?:
          koi|jp|jis|euc|gb|big5|isoir|cp1251|georgianps|pt154|tis
        )[-_a-z0-9]*}ix;
We always have triggered SUBJ_ALL_CAPS.
Comment 1 Oleg Gawriloff 2007-09-04 01:23:43 UTC
It seems that this rewrite in Constants.pm is totally wrong, 'cause even after 
changing subject encoding to cp1251 (i.e.: Subject: 
=?cp1251?B?wOryILkgMDAwMDAwOTI3MyDu8iAwMS4wOC4yMDA3?=
 =?cp1251?B?IOTr/yAiT09PIN309OXq8uji7fvlIO/w7uPw4Ozs?=
 =?cp1251?B?+yIg7vIgwPLr4O3yINLl6+Xq7uw=?=)
it triggers SUBJ_ALL_CAPS. After rewriting regexp to:

use constant CHARSETS_LIKELY_TO_FP_AS_CAPS => qr{[-_a-z0-9?]*(
          koi|jp|jis|euc|gb|big5|isoir|cp1251|georgianps|pt154|tis
        )[-_a-z0-9?]*}ix;
all works well
Comment 2 Michael Scheidell 2009-01-19 09:40:42 UTC
one more, likely to FP as all caps:

(running SA 3.2.5):

Subject: =?windows-1255?B?Rlc6IOHp9+X4+iD08Onu6fo=?=\


Comment 3 George L. Yermulnik 2009-04-17 06:38:33 UTC
(In reply to comment #2)
> one more, likely to FP as all caps:
> 
> (running SA 3.2.5):
> 
> Subject: =?windows-1255?B?Rlc6IOHp9+X4+iD08Onu6fo=?=\
> 

More charsets likely to FP as all caps, but not present in CHARSETS_LIKELY_TO_FP_AS_CAPS (Constants.pm): windows-1251 and win-1251
Both are synonyms for cp1251, but some MUA's prefer to use them in place of cp1251.
Comment 4 Henrik Krohns 2019-07-30 18:01:55 UTC
Closing old stale bug. Most of this stuff seems to be fixed in current version. More recent samples required for further tuning.