Bug 9368

Summary: IBM-1047 EBCDIC codepage not supported?
Product: Xerces-J Reporter: bauman
Component: SAXAssignee: Xerces-J Developers Mailing List <xerces-j-dev>
Status: NEW ---    
Severity: blocker    
Priority: P3    
Version: 1.4.4   
Target Milestone: ---   
Hardware: Other   
OS: other   

Description bauman 2002-05-23 21:54:12 UTC
TThere doesn't seem to be a supported encoding for the Latin-1 EBCDIC codepage 
for OS/390 (IBM-1047) in Xerces-J.  Xerces-C allows "ibm-1047-s390", but why not 
a supported encoding for Xerces-J?  I would use "ebcdic-cp-us", but there are 
some basic differences in the codepages (like the "[" and "]" characters) that 
make this a showstopper.
Comment 1 Glenn Marcy 2002-05-23 22:45:34 UTC
I would guess that the main reason that IBM-1047 is not supported in Xerces
is that there is no mention of that codepage in the IANA character set list.
Since there is no standard interoperable name that one could use for documents 
with that encoding, it is not listed in the encoding name table.  Without a
registered standard name, it is difficult to envision that all XML processors
could process such a document, even if they supported the codepage.
Comment 2 Matthew Sykes 2003-06-02 21:26:02 UTC
It looks like IBM-1047 has made it to the IANA list.

http://www.iana.org/assignments/character-sets contains the following:

Name: IBM1047                                                [Robrigado]
MIBenum: 2102
Source: IBM1047 (EBCDIC Latin 1/Open Systems)
http://www-
1.ibm.com/servers/eserver/iseries/software/globalization/pdf/cp01047z.pdf
Alias: IBM-1047
Comment 3 Michael Glavassevich 2003-06-02 21:36:12 UTC
Support for this encoding is in Xerces-J2.
Comment 4 bauman 2003-06-09 17:57:21 UTC
Yes, I got that.  My manager an I petitioned the IANA people for it and they 
added it in.