Bug 48740 - CP932 conversion table is different from that of Windows.
Summary: CP932 conversion table is different from that of Windows.
Status: NEW
Alias: None
Product: APR
Classification: Unclassified
Component: APR-iconv (show other bugs)
Version: 1.2.1
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache Portable Runtime bugs mailinglist
Depends on:
Reported: 2010-02-14 10:32 UTC by Nayuta Taga
Modified: 2011-08-19 16:13 UTC (History)
0 users

apr-iconv-1.2.1-cp932-patch.txt (48.06 KB, patch)
2010-02-14 10:32 UTC, Nayuta Taga
Details | Diff
apr-iconv-1.2.1-cp932-patch2.txt (50.88 KB, patch)
2010-02-14 10:33 UTC, Nayuta Taga
Details | Diff
cp932_roundtrip.tar.bz2 (667.58 KB, application/octet-stream)
2010-02-14 10:43 UTC, Nayuta Taga

Note You need to log in before you can comment on or make changes to this bug.
Description Nayuta Taga 2010-02-14 10:32:53 UTC
Created attachment 24981 [details]

CP932 conversion table of apr-iconv-1.2.1 is different from that of Windows.

My patch (apr-iconv-1.2.1-cp932-patch.txt) corrects the problem.

My another patch (apr-iconv-1.2.1-cp932-patch2.txt) also corrects the problem,
and add some conversions to be compatible with Java and glibc.

The cp932_roundtrip.html in cp932_roundtrip.tgz describes what
conversions are different from Windows'.
(And it also describes conversion tables of other libraries and languages)
Comment 1 Nayuta Taga 2010-02-14 10:33:41 UTC
Created attachment 24982 [details]
Comment 2 Nayuta Taga 2010-02-14 10:43:31 UTC
Created attachment 24983 [details]

cp932_roundtrip.tgz is too large for this bugzilla,
so I compressed it again by bzip2, and renamed.
Comment 3 Nayuta Taga 2010-02-20 03:12:36 UTC
Is there anyone who is interested in this problem?
Comment 4 Wim Lewis 2011-08-18 07:05:10 UTC
This seems like work which should be incorporated into apr-iconv, but it is confusing since the extensions to cp932 are not mentioned in either Microsoft's or Unicode.org's descriptions of cp932. Microsoft has a long history of calling different character sets by the same name, though. Is there an unambiguous name for this extension of the old cp932?
Comment 5 Nayuta Taga 2011-08-19 16:13:09 UTC
What does "the extensions to cp932" mean ?
(apr-iconv-1.2.1-cp932-patch.txt or apr-iconv-1.2.1-cp932-patch2.txt ?)

Does "the old cp932" mean the current imprementation of apr-iconv ?


apr-iconv-1.2.1-cp932-patch.txt is as same as The Windows' table.

The Unicode.org's CP932-to-Unicode table is
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT .
It is almost as same as the Windows' table.
(0x80 and characters in the private use area are only in Windows' table)

The Unicode.org's Unicode-to-CP932 table is the reverse of CP932.TXT.
But some characters' Unicode-to-CP932 mappings are ambiguous.
To remove the ambiguity,
consider http://support.microsoft.com/default.aspx?scid=kb;en-us;Q170559 .
After remove the ambiguity, we get an Unicode-to-CP932 table.
It is almost as same as the Windows' table.
(U+0080 and characters in the private use area are only in Windows' table)