Bug 45551 - poi-3.5-beta1-20080718.jar - content from a TextBox object in a 2007 xlsx document is not extracted.
Summary: poi-3.5-beta1-20080718.jar - content from a TextBox object in a 2007 xlsx doc...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-05 05:30 UTC by xtrim
Modified: 2013-08-08 14:27 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (35.52 KB, application/x-zip-compressed)
2008-08-05 05:30 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 05:30:38 UTC
Created attachment 22374 [details]
Contains JUnit test class and documents used for testing.

The text contained in a TextBox inserted/created in an excel 2007 document is not extracted.
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the document "classic_ContentInTextBox.xlsx" contains the words "testdoc" and "test phrase" in a TextBox inserted in the document.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Darren Roberts 2013-08-05 10:15:53 UTC
Tested this bug using this code:

POITextExtractor extr = null;
String text = null;
try {
    extr = ExtractorFactory.createExtractor(new File("classic_ContentInTextBox.xlsx"));
    text = extr.getText();
            
    System.out.println(text);
    System.out.println(text.contains("testdoc"));
    System.out.println(text.contains("test phrase"));
            
} catch (Exception e) {
    e.printStackTrace();
}

and the patch from https://issues.apache.org/bugzilla/show_bug.cgi?id=55347. The issue will then be resolved.
Comment 2 Tim Allison 2013-08-08 14:27:41 UTC
55347 committed.  Confirmed fixed in trunk.