Bug 45565 - poi-3.5-beta1-20080718.jar - content from a TextBox object of a 2003 xls document is not extracted.
Summary: poi-3.5-beta1-20080718.jar - content from a TextBox object of a 2003 xls docu...
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.16-dev
Hardware: PC All
: P2 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-05 07:49 UTC by xtrim
Modified: 2016-09-22 03:34 UTC (History)
0 users



Attachments
Contains JUnit test class and XLS document used for testing. (2.07 KB, application/x-zip-compressed)
2008-08-05 07:49 UTC, xtrim
Details
Contains JUnit test class and a DOC document used for testing. (6.44 KB, application/x-zip-compressed)
2009-01-29 04:29 UTC, Jose M. Sánchez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 07:49:11 UTC
Created attachment 22388 [details]
Contains JUnit test class and XLS document used for testing.

The text contained in a TextBox inserted/created in an excel 2003 document is not extracted.
Find in attachments the JUnit test class and the document used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the document "classic.TextInTextBox.xls" contains the words "testdoc" and "test phrase" in a TextBox inserted in the document.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Jose M. Sánchez 2009-01-29 04:29:55 UTC
Created attachment 23191 [details]
Contains JUnit test class and a DOC document used for testing.
Comment 2 Jose M. Sánchez 2009-01-29 04:31:05 UTC
With 3.2-FINAL to 3.5-beta1 versions also not extracts the contents of the text boxes in word 97 documents.

As in the previous comment, we have uploaded a JUnit test, that reproduces the error with WordExtractor and the ExtractorFactory.
Comment 3 MaryAubaun 2012-04-20 22:39:45 UTC
I get the same problem with the event based parsers, for both the 97-2003 formats and the 2007/xslx formats.  If anyone can give an idea what code to add, I may be able to put it in, at least into the event-based one, and post the code.

Also would like to get hidden text and revision marks, as settable options, and can write the code for it if someone can point me in the right direction.
Comment 4 Javen O'Neal 2016-09-22 03:34:23 UTC
This is still failing in POI 3.15 final. Neither the ExcelExtractor nor the WordExtractor currently check TextBox objects for text. Patches to add this functionality are welcome!

Added failing unit test in r1761841.