Bug 60160 - ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.
Summary: ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.15-dev
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-21 07:15 UTC by Akash Sudhakar
Modified: 2019-04-23 20:43 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Akash Sudhakar 2016-09-21 07:15:13 UTC
Code -     
    byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
    HWPFDocument doc = new HWPFDocument(new ByteArrayInputStream(bytes));
    // using XWPFWordExtractor Class
    System.out.println(doc.getDocumentText());



Exception stack trace - 

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:342)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
	at com.test.DocExtractor.main(DocExtractor.java:12)

If we can some how ignore this exception we can get other parts of the document.
Comment 1 Javen O'Neal 2016-09-21 08:29:54 UTC
Could you include the file that caused this problem?

FYI, it is simpler to open the document via a POIFSFileSystem.
POIFSFileSystem fs = POIFSFileSystem.create(file);
HWPFDocument doc = new HWPFDocument(fs);
doc.getDocumentText();
...
doc.close();
fs.close();
Comment 2 Akash Sudhakar 2016-09-21 08:53:13 UTC
File is classified file. So cannot share it.
If we save the file again as doc file, then issue is not coming.
Comment 3 Javen O'Neal 2016-09-21 09:11:37 UTC
Do you know what software was used to generate the original file?

Without a way to reproduce the problem, there's not much that we can do.
You could run the file through POIFSDump, BiffViewer or other developer tools (Microsoft publishes some validators), but it is unlikely that a developer will spend much effort with such limited information, nothing to test, for such a minor problem. They're more likely to introduce bugs by making changes.

https://poi.apache.org/apidocs/org/apache/poi/poifs/dev/POIFSDump.html
Comment 5 Akash Sudhakar 2016-09-21 09:24:01 UTC
Is this issue due to incorrect file or is it a bug ?
Let me check if I can share the document.
Comment 6 Dominik Stadler 2019-04-23 20:43:07 UTC
No more information received for a long time and probably a corrupt file created with some other tool, therefore we do not plan to fix anything until we receive more information and/or a sample document here.