Bug 6082 - Many encodings are broken in Xerces
Summary: Many encodings are broken in Xerces
Status: NEW
Alias: None
Product: Xerces-J
Classification: Unclassified
Component: Serialization (show other bugs)
Version: 1.4.4
Hardware: All other
: P3 critical
Target Milestone: ---
Assignee: Xerces-J Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-01-29 01:00 UTC by Paul Prescod
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments
A diff file, from a patched version (1.64 KB, patch)
2002-02-05 07:00 UTC, Steve Fossen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Prescod 2002-01-29 01:00:20 UTC
The ISO-8859-n (n>1) encodings are broken because the "lastPrintable" character 
is set to 0xFF, when it should be set to 0x7F (see Encodings.java, line 110-
118). 

The Windows-31J encoding is broken because the Java encoder is broken. It 
cannot correctly round-trip several characters. The characters that it cannot 
round-trip are:

0xa2
0xa3
0xa5
0xab
0xac
0xaf
0xb5
0xb7
0xb8
0xbb
0x203e
0x3094

The reason is because the same encoded byte patterns are used by different code 
points:

The byte pattern (92,) is used by: 5c, a5
The byte pattern (126,) is used by: 7e, 203e
The byte pattern (-127, -111) is used by: a2, ffe0
The byte pattern (-127, -110) is used by: a3, ffe1
The byte pattern (-127, -31) is used by: ab, 226a
The byte pattern (-127, -54) is used by: ac, ffe2
The byte pattern (-127, 80) is used by: af, ffe3
The byte pattern (-125, -54) is used by: b5, 3bc
The byte pattern (-127, 69) is used by: b7, 30fb
The byte pattern (-127, 67) is used by: b8, ff0c
The byte pattern (-127, -30) is used by: bb, 226b
The byte pattern (-125, -108) is used by: 3094, 30f4

You can fix this by adding these characters to the "JIS_DANGER_CHARS" 
(Encodings.java, line 99) or by creating a new list of danager characters just 
for Windows-31J.
Comment 1 Steve Fossen 2002-02-05 07:00:20 UTC
Created attachment 1133 [details]
A diff file, from a patched version
Comment 2 Steve Fossen 2002-02-05 07:02:07 UTC
I was in touch with the bug reporter, and used their testing file.. 
  this appears to fix the reported problems..
   Steve Fossen
   sfossen@yahoo.com