Bug 24570 - POI don't extract right properties in Chinese characters
Summary: POI don't extract right properties in Chinese characters
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2003-11-10 17:40 UTC by Yanjun Liao
Modified: 2004-11-16 19:05 UTC (History)
0 users

A file of properties in chinese characters (31.00 KB, application/ms-word)
2003-11-10 17:41 UTC, Yanjun Liao

Note You need to log in before you can comment on or make changes to this bug.
Description Yanjun Liao 2003-11-10 17:40:42 UTC
I am using jakarta-poi-1.10.0-dev version. I tried 2.0-pre3, it seems it has 
the same problem. I have a file which properties are in Chinese characters. POI 
can extract the properties (DocumentSummaryInformation and SummaryInformation), 
but the characters are all messed up after extraction. I believe that it can 
not handle the right encoding in TypeReader class. My OS is an English 
operating system, but it should not be a reason why POI is not working right. 
Because I had a similar program to extract properties which is written in c++, 
it can do the right thing. BTW, I tried Japanse characters, it has the same 
problem. I will attach this file, hopefully, this problem can be addressed 
soon, because it is the critical path of my project now. Thanks
Comment 1 Yanjun Liao 2003-11-10 17:41:34 UTC
Created attachment 9024 [details]
A file of properties in chinese characters
Comment 2 Rainer Klute 2003-11-10 21:49:05 UTC
I am willing to work on this. However, I'd need some general information about
code pages. If someone has a hint (link), please let me know!
Comment 3 Rainer Klute 2003-12-02 17:50:19 UTC
I just added codepage support to the CVS HEAD. Your sample document looks okay
to me (which doesn't necessarily mean anything). Please get the HEAD from the
CVS repository and cross check!
Comment 4 Yanjun Liao 2004-02-02 16:48:57 UTC
Hi, I just download poi-bin-2.0-final-20040126.zip and poi-bin-2.0-RC2-
20040102.zip. Neither one of them seem to solve the problem. Does your fix go 
into these two release yet? If not, how can I get this fix without using CVS? 
Because I am behind company firewall. Thanks a lot.
Comment 5 Rainer Klute 2004-02-02 20:56:10 UTC
Sorry, but the codepage support is in the CVS HEAD only, not in the 2.0 release.
However, I don't know what you could to do get it through a firewall.
Comment 6 Yanjun Liao 2004-02-03 19:31:17 UTC
Hi, I got the snopshot from http://cvs.apache.org/snapshots/jakarta-poi/. It 
works fine. Thanks.