Bug 30303

Summary: Cannot read word documents properties for docs created on Mac
Product: POI Reporter: Angsuman Chakraborty <angsuman>
Component: HPSFAssignee: POI Developers List <dev>
Status: CLOSED FIXED    
Severity: blocker    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: Macintosh   
OS: All   
Attachments: Sample Word document which crashes any attempt to read document properties
Modified source which has a fix/workaround for the defect

Description Angsuman Chakraborty 2004-07-24 11:49:58 UTC
Several word documents(probably all) created on Mac and opened for reading 
properties on windows caused a UnsupportedEncodingException cp10000 in the line:
value = new String(src, (int) first, l, codepageToEncoding(codepage));

That is because I think there is a parsing error in extracting the encoding. 
However the actual text looked ok. So my workaround was:

value = new String(src, (int) first, l);
                if(codepage != -1) {
                    try {
                        value = new String(src, (int) first, l, 
codepageToEncoding(codepage));
                    } catch(UnsupportedEncodingException ignore) {
                        // The previous assignment is acceptable when the 
encoding is not supported
                        // Want to throw a warning message here, but how?
                    }
                }

Let me know if you need any sample documents to demonstrate the error. I have 
quite a few of them.
Comment 1 Angsuman Chakraborty 2004-07-24 11:51:37 UTC
Created attachment 12207 [details]
Sample Word document which crashes any attempt to read document properties
Comment 2 Angsuman Chakraborty 2004-07-24 11:54:02 UTC
Created attachment 12208 [details]
Modified source which has a fix/workaround for the defect
Comment 3 Piers 2004-07-25 15:03:53 UTC
Hi Angsuman,

I've run the Mac document you supplied as a testcase
through the HWFP code which has had patch 30235 applied.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30235

This works fine, creating a working copy of the document.
I know that some of the work that was done in creating
this patch fixed a number of bugs. As you rightly point out
the error is most probably due to a parsing error.

Can you test you code with the above patch applied?
If it still doesn't work, can you please post an example
of the calling line in your application code that will cause 
the error so that I can take a look at it?

Many thanks,
             Piers

Comment 4 Piers Taylor 2004-07-25 15:13:30 UTC
Appologies everyone - I was logged in on my old account for which the email 
address is invalid. Please use the email address associated with this account 
if you need to contact me.

Piers
Comment 5 Angsuman Chakraborty 2004-08-10 15:35:39 UTC
The new version from CVS has the problem fixed (please see attachment below for
details). Thanks Piers for the prompt response and resolution. Your immediate
response was much much better than I getr from big companies where I pay tons
for the product :)


------- Additional Comments From piers.taylor@gossinteractive.com  2004-07-25
15:03 -------
Hi Angsuman,

I've run the Mac document you supplied as a testcase
through the HWFP code which has had patch 30235 applied.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30235

This works fine, creating a working copy of the document.
I know that some of the work that was done in creating
this patch fixed a number of bugs. As you rightly point out
the error is most probably due to a parsing error.

Can you test you code with the above patch applied?
If it still doesn't work, can you please post an example
of the calling line in your application code that will cause 
the error so that I can take a look at it?

Many thanks,
             Piers