|Summary:||CP932 conversion table is different from that of Windows.|
|Product:||APR||Reporter:||Nayuta Taga <ganaware+issues.apache.org>|
|Component:||APR-iconv||Assignee:||Apache Portable Runtime bugs mailinglist <bugs>|
Description Nayuta Taga 2010-02-14 10:32:53 UTC
Created attachment 24981 [details] apr-iconv-1.2.1-cp932-patch.txt CP932 conversion table of apr-iconv-1.2.1 is different from that of Windows. My patch (apr-iconv-1.2.1-cp932-patch.txt) corrects the problem. My another patch (apr-iconv-1.2.1-cp932-patch2.txt) also corrects the problem, and add some conversions to be compatible with Java and glibc. The cp932_roundtrip.html in cp932_roundtrip.tgz describes what conversions are different from Windows'. (And it also describes conversion tables of other libraries and languages)
Comment 1 Nayuta Taga 2010-02-14 10:33:41 UTC
Created attachment 24982 [details] apr-iconv-1.2.1-cp932-patch2.txt
Comment 2 Nayuta Taga 2010-02-14 10:43:31 UTC
Created attachment 24983 [details] cp932_roundtrip.tar.bz2 cp932_roundtrip.tgz is too large for this bugzilla, so I compressed it again by bzip2, and renamed.
Comment 3 Nayuta Taga 2010-02-20 03:12:36 UTC
Is there anyone who is interested in this problem?
Comment 4 Wim Lewis 2011-08-18 07:05:10 UTC
This seems like work which should be incorporated into apr-iconv, but it is confusing since the extensions to cp932 are not mentioned in either Microsoft's or Unicode.org's descriptions of cp932. Microsoft has a long history of calling different character sets by the same name, though. Is there an unambiguous name for this extension of the old cp932?
Comment 5 Nayuta Taga 2011-08-19 16:13:09 UTC
What does "the extensions to cp932" mean ? (apr-iconv-1.2.1-cp932-patch.txt or apr-iconv-1.2.1-cp932-patch2.txt ?) Does "the old cp932" mean the current imprementation of apr-iconv ? ---- apr-iconv-1.2.1-cp932-patch.txt is as same as The Windows' table. The Unicode.org's CP932-to-Unicode table is http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT . It is almost as same as the Windows' table. (0x80 and characters in the private use area are only in Windows' table) The Unicode.org's Unicode-to-CP932 table is the reverse of CP932.TXT. But some characters' Unicode-to-CP932 mappings are ambiguous. To remove the ambiguity, consider http://support.microsoft.com/default.aspx?scid=kb;en-us;Q170559 . After remove the ambiguity, we get an Unicode-to-CP932 table. It is almost as same as the Windows' table. (U+0080 and characters in the private use area are only in Windows' table)