Created attachment 22374 [details] Contains JUnit test class and documents used for testing. The text contained in a TextBox inserted/created in an excel 2007 document is not extracted. Find in attachments the JUnit test class and the documents used for testing. We expected to extract the words "testdoc" and "test phrase". Notes on the attached documents: - the document "classic_ContentInTextBox.xlsx" contains the words "testdoc" and "test phrase" in a TextBox inserted in the document. "TestUnitPoi35Filter.java" is the JUnit class.
Tested this bug using this code: POITextExtractor extr = null; String text = null; try { extr = ExtractorFactory.createExtractor(new File("classic_ContentInTextBox.xlsx")); text = extr.getText(); System.out.println(text); System.out.println(text.contains("testdoc")); System.out.println(text.contains("test phrase")); } catch (Exception e) { e.printStackTrace(); } and the patch from https://issues.apache.org/bugzilla/show_bug.cgi?id=55347. The issue will then be resolved.
55347 committed. Confirmed fixed in trunk.