Bug 54877 - ArrayIndexOutOfBoundsException when try to read ms doc
Summary: ArrayIndexOutOfBoundsException when try to read ms doc
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.10-dev
Hardware: PC All
: P2 blocker with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-23 09:19 UTC by celine
Modified: 2016-07-24 11:23 UTC (History)
1 user (show)



Attachments
Java file to test POI (5.59 KB, application/octet-stream)
2013-04-23 09:19 UTC, celine
Details
the word document (2.50 KB, application/msword)
2013-04-23 09:19 UTC, celine
Details

Note You need to log in before you can comment on or make changes to this bug.
Description celine 2013-04-23 09:19:19 UTC
Created attachment 30222 [details]
Java file to test POI

Hi,

I doing some testing with both POI 3.9 and POI-4.0-beta1-20130403 to read and write to ms doc (.doc). But there is a problem when trying to read back the .doc file which being written with POI,
java.lang.ArrayIndexOutOfBoundsException: 12
	at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:224)
	at org.apache.poi.hwpf.model.types.FibBaseAbstractType.fillFields(FibBaseAbstractType.java:96)
	at org.apache.poi.hwpf.model.FibBase.<init>(FibBase.java:43)
	at org.apache.poi.hwpf.model.FileInformationBlock.<init>(FileInformationBlock.java:71)
	at org.apache.poi.hwpf.HWPFDocumentCore.<init>(HWPFDocumentCore.java:155)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:218)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
	at wordwriter.POI_Tester.readMyDocument(POI_Tester.java:100)
done
	at wordwriter.POI_Tester.readDoc(POI_Tester.java:78)
	at wordwriter.POI_Tester.main(POI_Tester.java:63)
Comment 1 celine 2013-04-23 09:19:57 UTC
Created attachment 30223 [details]
the word document
Comment 2 sasrar87 2013-04-24 16:58:57 UTC
I just tried this code on my machine and I got the same result: ArrayIndexOutOfBoundsException.

However, reading from a Word document that was not written to by a Java program using the POI library works without any exceptions popping up. 

However, I keep seeing that I get two "end of line" characters whenever the program parses text from a word document. Is this a bug? Is it normal for there to be two end of line characters at the end of a doc file?
Comment 3 Dominik Stadler 2016-07-24 11:23:44 UTC
It seems the file-information-block in this file is too small, also LibreOffice refuses to open this file, so I don't think we can do much here unless someone comes up with a way to safely read such broken files.