Summary: | StringIndexOutOfBoundsException when extracting text from a Word document. | ||
---|---|---|---|
Product: | POI | Reporter: | Bj <bjorn.wang> |
Component: | POI Overall | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | critical | ||
Priority: | P1 | ||
Version: | 3.0-dev | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
URL: | http://marc.theaimsgroup.com/?l=poi-user&m=110183472231615&w=2 | ||
Attachments: |
Simplest possible testcase showing the StringIndexOutOfBoundsException
Here is a proposed fix to this issue. A proposed fix which rewrites the loops One file that trigger a StringIndexOutOfBoundsException with POI 3.2 Final |
Description
Bj
2006-11-29 05:44:46 UTC
Created attachment 19200 [details]
Simplest possible testcase showing the StringIndexOutOfBoundsException
is this fixed in poi-bin-3.0-alpha3-20061212.zip? i just applied these jars and i still see the same problem. Created attachment 19768 [details]
Here is a proposed fix to this issue.
It simply catches the index out of bounds exception on the substring method
call and returns an empty string in that scenario.
Created attachment 19798 [details]
A proposed fix which rewrites the loops
The code gets a List of text runs and a List of text pieces. The existing code
fails when the start of one text piece is not the same as the end of the
previous piece. The assumption is made in several places.
My proposed patch rewrites the loop to make the code smaller and simpler. The
first proposed patch is made obsolete by this patch because the
StringIndexOutOfBoundsException won't happen anymore.
I might be being stupid, but I can't actually figure out what file the most recent patch applies to... The patch header refers to WordExtractor.java, but the code doesn't look anything like org.apache.poi.hwpf.extractor.WordExtractor Created attachment 22957 [details]
One file that trigger a StringIndexOutOfBoundsException with POI 3.2 Final
I also use POI through Nutch and I tried to install POI 3.2 on Nutch 0.9.1.
Although this bug is marked as fixed in POI 3.0, I can reproduce on many documents (I attached one of them) with POI 3.2 FINAL...
|