Bug 55966 - Text contents of content controls within paragraphs, not appearing in XWPFWordExtractor.getText()
Summary: Text contents of content controls within paragraphs, not appearing in XWPFWor...
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.10-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2014-01-07 16:38 UTC by Ben Best
Modified: 2020-03-28 09:27 UTC (History)
0 users

Word document with 3 content controls (20.27 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-01-07 16:38 UTC, Ben Best

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Best 2014-01-07 16:38:30 UTC
Created attachment 31177 [details]
Word document with 3 content controls

When calling getText() the contents of the content controls is not returned when the content control is within a paragraph with other text.

When the content control is the only item then the text is there.

This appears to be the exact opposite of the behaviour in 3.9 where text in a content control where that is the only item in a paragraph doesn't appear though that in a paragraph with other text does. (That fix appears to have been in the onDocumentRead() method of org.apache.poi.xwpf.XWPFDocument).

I've used the following test (and attached document to demonstrate the problem.

	public void test_manualDoc() throws FileNotFoundException, IOException  {
		String filepath = "resources/contentcontrol.docx";
		String expected = "Content control within a paragraph is here text content from within a paragraph second control with a new\nline\n\nContent control that is the entire paragraph";

		XWPFDocument doc = new XWPFDocument(new FileInputStream(filepath));
		XWPFWordExtractor extractedDoc = new XWPFWordExtractor(doc);

		String actual = extractedDoc.getText();
		Assert.assertEquals(expected, actual);

Comment 1 Dominik Stadler 2020-03-28 09:27:24 UTC
Fixed in r1875802 by including runs of type XWPFSDT during text-extraction.