Summary: CP932 conversion table is different from that of Windows.
Description Nayuta Taga 2010-02-14 10:32:53 UTC
CP932 conversion table of apr-iconv-1.2.1 is different from that of Windows.

My patch (apr-iconv-1.2.1-cp932-patch.txt) corrects the problem.

My another patch (apr-iconv-1.2.1-cp932-patch2.txt) also corrects the problem,
and add some conversions to be compatible with Java and glibc.

The cp932_roundtrip.html in cp932_roundtrip.tgz describes what
conversions are different from Windows'.
(And it also describes conversion tables of other libraries and languages)
Comment 1 Nayuta Taga 2010-02-14 10:33:41 UTC
Comment 2 Nayuta Taga 2010-02-14 10:43:31 UTC
cp932_roundtrip.tgz is too large for this bugzilla,
so I compressed it again by bzip2, and renamed.
Is there anyone who is interested in this problem?
Is there anyone who is interested in this problem?
Comment 4 Wim Lewis 2011-08-18 07:05:10 UTC
This seems like work which should be incorporated into apr-iconv, but it is confusing since the extensions to cp932 are not mentioned in either Microsoft's or Unicode.org's descriptions of cp932. Microsoft has a long history of calling different character sets by the same name, though. Is there an unambiguous name for this extension of the old cp932?
Comment 5 Nayuta Taga 2011-08-19 16:13:09 UTC
What does "the extensions to cp932" mean ?
(apr-iconv-1.2.1-cp932-patch.txt or apr-iconv-1.2.1-cp932-patch2.txt ?)

Does "the old cp932" mean the current imprementation of apr-iconv ?


apr-iconv-1.2.1-cp932-patch.txt is as same as The Windows' table.

The Unicode.org's CP932-to-Unicode table is
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT .
It is almost as same as the Windows' table.
(0x80 and characters in the private use area are only in Windows' table)

The Unicode.org's Unicode-to-CP932 table is the reverse of CP932.TXT.
But some characters' Unicode-to-CP932 mappings are ambiguous.
To remove the ambiguity,
consider http://support.microsoft.com/default.aspx?scid=kb;en-us;Q170559 .
After remove the ambiguity, we get an Unicode-to-CP932 table.
It is almost as same as the Windows' table.
(U+0080 and characters in the private use area are only in Windows' table)