Code - byte[] bytes = IOUtils.toByteArray(new FileInputStream(file)); HWPFDocument doc = new HWPFDocument(new ByteArrayInputStream(bytes)); // using XWPFWordExtractor Class System.out.println(doc.getDocumentText()); Exception stack trace - Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:342) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174) at com.test.DocExtractor.main(DocExtractor.java:12) If we can some how ignore this exception we can get other parts of the document.
Could you include the file that caused this problem? FYI, it is simpler to open the document via a POIFSFileSystem. POIFSFileSystem fs = POIFSFileSystem.create(file); HWPFDocument doc = new HWPFDocument(fs); doc.getDocumentText(); ... doc.close(); fs.close();
File is classified file. So cannot share it. If we save the file again as doc file, then issue is not coming.
Do you know what software was used to generate the original file? Without a way to reproduce the problem, there's not much that we can do. You could run the file through POIFSDump, BiffViewer or other developer tools (Microsoft publishes some validators), but it is unlikely that a developer will spend much effort with such limited information, nothing to test, for such a minor problem. They're more likely to introduce bugs by making changes. https://poi.apache.org/apidocs/org/apache/poi/poifs/dev/POIFSDump.html
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/model/SectionTable.java?view=markup#l80 fileOffset or sepxSize is likely -1.
Is this issue due to incorrect file or is it a bug ? Let me check if I can share the document.
No more information received for a long time and probably a corrupt file created with some other tool, therefore we do not plan to fix anything until we receive more information and/or a sample document here.