Bug 60160

Summary: ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.
Product: POI Reporter: Akash Sudhakar <akki.1607>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: major    
Priority: P2    
Version: 3.15-dev   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Akash Sudhakar 2016-09-21 07:15:13 UTC
Code -     
    byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
    HWPFDocument doc = new HWPFDocument(new ByteArrayInputStream(bytes));
    // using XWPFWordExtractor Class
    System.out.println(doc.getDocumentText());



Exception stack trace - 

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:342)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
	at com.test.DocExtractor.main(DocExtractor.java:12)

If we can some how ignore this exception we can get other parts of the document.
Comment 1 Javen O'Neal 2016-09-21 08:29:54 UTC
Could you include the file that caused this problem?

FYI, it is simpler to open the document via a POIFSFileSystem.
POIFSFileSystem fs = POIFSFileSystem.create(file);
HWPFDocument doc = new HWPFDocument(fs);
doc.getDocumentText();
...
doc.close();
fs.close();
Comment 2 Akash Sudhakar 2016-09-21 08:53:13 UTC
File is classified file. So cannot share it.
If we save the file again as doc file, then issue is not coming.
Comment 3 Javen O'Neal 2016-09-21 09:11:37 UTC
Do you know what software was used to generate the original file?

Without a way to reproduce the problem, there's not much that we can do.
You could run the file through POIFSDump, BiffViewer or other developer tools (Microsoft publishes some validators), but it is unlikely that a developer will spend much effort with such limited information, nothing to test, for such a minor problem. They're more likely to introduce bugs by making changes.

https://poi.apache.org/apidocs/org/apache/poi/poifs/dev/POIFSDump.html
Comment 5 Akash Sudhakar 2016-09-21 09:24:01 UTC
Is this issue due to incorrect file or is it a bug ?
Let me check if I can share the document.
Comment 6 Dominik Stadler 2019-04-23 20:43:07 UTC
No more information received for a long time and probably a corrupt file created with some other tool, therefore we do not plan to fix anything until we receive more information and/or a sample document here.