Bug 45713 - java.io.IOException: The text piece table is corrupted ( While processing a Word Document )
Summary: java.io.IOException: The text piece table is corrupted ( While processing a W...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.0-FINAL
Hardware: Sun Solaris
: P1 blocker (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-29 12:46 UTC by Durga Deep Tirunagari
Modified: 2015-12-28 17:56 UTC (History)
1 user (show)



Attachments
This is the document we are trying to glean text from. (42.00 KB, application/msword)
2008-08-29 12:46 UTC, Durga Deep Tirunagari
Details
To remove the previous Attachment (465 bytes, application/octet-stream)
2008-08-29 14:13 UTC, Durga Deep Tirunagari
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Durga Deep Tirunagari 2008-08-29 12:46:20 UTC
Created attachment 22499 [details]
This is the document we are trying to glean text from.

java.io.IOException: The text piece table is corrupted
        at org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:53)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:219)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:152)
        at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:57)

Code Snippet.

        POIFSFileSystem mswordpoifs = null;
        try {
            mswordpoifs = HWPFDocument.verifyAndBuildPOIFS(isr);
        } catch (IllegalArgumentException iae) {
        } catch (IOException ioe) {
        }
        WordExtractor we = null;
        try {
            // causes the IOException.
            we = new WordExtractor(mswordpoifs);
        } catch (java.io.IOException cie) {
            cie.printStackTrace();

        }
Comment 1 Durga Deep Tirunagari 2008-08-29 14:13:24 UTC
Created attachment 22500 [details]
To remove the previous Attachment
Comment 2 Nick Burch 2008-09-02 14:02:02 UTC
Are you sure the POI exception isn't actually correct? Are you sure that the text piece table isn't in fact corrupted in your document?
Comment 3 Durga Deep Tirunagari 2008-09-04 10:46:47 UTC
(In reply to comment #2)
> Are you sure the POI exception isn't actually correct? Are you sure that the
> text piece table isn't in fact corrupted in your document?
> 

We are trying to Index content from a Word Document attachment. Even though the text piece table is not correctly formatted we still want to glean the text from such a document. Please let me know if you need more info,

Comment 4 Dominik Stadler 2015-12-28 17:56:43 UTC
Resolving some very old bugs which did not have any update for years, please reopen if this is still an issue for you.