Bug 45558 - poi-3.5-beta1-20080718.jar - content from a TextBox object of a 2007 docx document is not extracted.
Summary: poi-3.5-beta1-20080718.jar - content from a TextBox object of a 2007 docx doc...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 55390
  Show dependency tree
 
Reported: 2008-08-05 05:50 UTC by xtrim
Modified: 2016-11-25 21:13 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (46.58 KB, application/x-zip-compressed)
2008-08-05 05:50 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 05:50:16 UTC
Created attachment 22381 [details]
Contains JUnit test class and documents used for testing.

The text contained in a TextBox inserted/created in a word 2007 document is not extracted.
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the documents "classic_TextInTextBox.docx" and "form_TextInTextBox.docx" contain the word "testdoc" in a TextBox inserted in the document.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Jose M. Sánchez 2009-01-29 04:28:27 UTC
With 3.2-FINAL to 3.5-beta1 versions also not extracts the contents of the text boxes in word 97 documents.

As in the previous comment, we have uploaded a JUnit test, that reproduces the error with WordExtractor and the ExtractorFactory.

Comment 2 Tim Allison 2013-08-08 13:14:56 UTC
Just looked into this.  The general issue was fixed in 3.9.  There is a formatting issue, though, that the test doc brings out -- new line incorrectly inserted between runs:

testdoc

extracted as

test\ndoc

Closing this issue and opening new issue for new line.