Bug 54877

Summary: ArrayIndexOutOfBoundsException when try to read ms doc
Product: POI Reporter: celine <swku2801>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: blocker CC: mastropos
Priority: P2    
Version: 3.10-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Java file to test POI
the word document

Description celine 2013-04-23 09:19:19 UTC
Created attachment 30222 [details]
Java file to test POI

Hi,

I doing some testing with both POI 3.9 and POI-4.0-beta1-20130403 to read and write to ms doc (.doc). But there is a problem when trying to read back the .doc file which being written with POI,
java.lang.ArrayIndexOutOfBoundsException: 12
	at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:224)
	at org.apache.poi.hwpf.model.types.FibBaseAbstractType.fillFields(FibBaseAbstractType.java:96)
	at org.apache.poi.hwpf.model.FibBase.<init>(FibBase.java:43)
	at org.apache.poi.hwpf.model.FileInformationBlock.<init>(FileInformationBlock.java:71)
	at org.apache.poi.hwpf.HWPFDocumentCore.<init>(HWPFDocumentCore.java:155)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:218)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
	at wordwriter.POI_Tester.readMyDocument(POI_Tester.java:100)
done
	at wordwriter.POI_Tester.readDoc(POI_Tester.java:78)
	at wordwriter.POI_Tester.main(POI_Tester.java:63)
Comment 1 celine 2013-04-23 09:19:57 UTC
Created attachment 30223 [details]
the word document
Comment 2 sasrar87 2013-04-24 16:58:57 UTC
I just tried this code on my machine and I got the same result: ArrayIndexOutOfBoundsException.

However, reading from a Word document that was not written to by a Java program using the POI library works without any exceptions popping up. 

However, I keep seeing that I get two "end of line" characters whenever the program parses text from a word document. Is this a bug? Is it normal for there to be two end of line characters at the end of a doc file?
Comment 3 Dominik Stadler 2016-07-24 11:23:44 UTC
It seems the file-information-block in this file is too small, also LibreOffice refuses to open this file, so I don't think we can do much here unless someone comes up with a way to safely read such broken files.