29086 – DocumentSummary ignores codepage settings

Bug 29086 - DocumentSummary ignores codepage settings

Summary: DocumentSummary ignores codepage settings

Status:	CLOSED WONTFIX

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HPSF (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-05-19 13:02 UTC by michael.gesmann
Modified:	2004-11-16 19:05 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description michael.gesmann 2004-05-19 13:02:50 UTC

Problem:
I have an excel input file, generated on a German PC, i.e. the file was written 
with an ISO-8859-1 encoding. The file properties as well as the content (sheet 
name and cell content) contain German umlaute.

Then I'm reading this file with a java engine with -Dfile.encoding=ISO646-US. 
I'm doing this in a debugger (CodeGuide). When reading the document's 
SummaryInformation with HPFS the returned strings (Java Unicode) contain "?" 
instead of the umlaute. 
When reading the sheet name and cell content I see umlaute as expected.

No examplary output:
Unfortunately, I can see this only in the debugger. I do not know, how to show 
this with a short example. If I use the property -Dfile.encoding=ISO-8859-1 
then I get the correct result with umlaute. If I use another encoding (in my 
case ISO646-US), then a System.out.print() converts all Umlaute into "?".

System environment:
I have downloaded poi-bin-2.5-final-20040302.zip from http://ftp.uni-
erlangen.de/pub/mirrors/apache/jakarta/poi/release/bin. 
So I expect this to be version 2.5 (not in the list above).
I'm compiling and running everything with jdk 1.4.2_02.

Relevance:
Problem not only occurs with explict setting of file.encoding property but also 
if file will be read on a maschine with a different default encoding. We are 
only interested in the Java Unicode String, not in any other output device.

Further info:
The current HPFS sources in CVS contain a class "VariantSupport.java" which 
seems to implement codepage support in the SummaryInformation. This source is 
not contained in the downloaded 2.5 version.
I can provide an example if needed, I have no idea how to attach it here.

Best regards,

   Michael Gesmann

Comment 1 Rainer Klute 2004-06-02 18:09:23 UTC

Codepage support is implemented in the CVS HEAD but not in the 2.5 release.