SA Bugzilla – Bug 5640
UTF8 is missing from CHARSETS_LIKELY_TO_FP_AS_CAPS
Last modified: 2019-07-30 18:01:55 UTC
We have following subject: Subject: =?utf-8?B?0JDQutGCIOKEliAwMDAwMDA5OTA1INC+0YIgMDEuMDguMjAwNw==?= =?utf-8?B?INC00LvRjyAi0JfQkNCeINCR0LXQu9Cw0J/QkNCdIiDQvtGCINCQ0YLQu9Cw0L3RgiDQ =?utf-8?B?0LXQu9C10LrQvtC8?= It's legal base64 utf8 subject, but due missing utf-8 in Constants.pm: use constant CHARSETS_LIKELY_TO_FP_AS_CAPS => qr{[-_a-z0-9]*(?: koi|jp|jis|euc|gb|big5|isoir|cp1251|georgianps|pt154|tis )[-_a-z0-9]*}ix; We always have triggered SUBJ_ALL_CAPS.
It seems that this rewrite in Constants.pm is totally wrong, 'cause even after changing subject encoding to cp1251 (i.e.: Subject: =?cp1251?B?wOryILkgMDAwMDAwOTI3MyDu8iAwMS4wOC4yMDA3?= =?cp1251?B?IOTr/yAiT09PIN309OXq8uji7fvlIO/w7uPw4Ozs?= =?cp1251?B?+yIg7vIgwPLr4O3yINLl6+Xq7uw=?=) it triggers SUBJ_ALL_CAPS. After rewriting regexp to: use constant CHARSETS_LIKELY_TO_FP_AS_CAPS => qr{[-_a-z0-9?]*( koi|jp|jis|euc|gb|big5|isoir|cp1251|georgianps|pt154|tis )[-_a-z0-9?]*}ix; all works well
one more, likely to FP as all caps: (running SA 3.2.5): Subject: =?windows-1255?B?Rlc6IOHp9+X4+iD08Onu6fo=?=\
(In reply to comment #2) > one more, likely to FP as all caps: > > (running SA 3.2.5): > > Subject: =?windows-1255?B?Rlc6IOHp9+X4+iD08Onu6fo=?=\ > More charsets likely to FP as all caps, but not present in CHARSETS_LIKELY_TO_FP_AS_CAPS (Constants.pm): windows-1251 and win-1251 Both are synonyms for cp1251, but some MUA's prefer to use them in place of cp1251.
Closing old stale bug. Most of this stuff seems to be fixed in current version. More recent samples required for further tuning.