Bug 55966

Summary: Text contents of content controls within paragraphs, not appearing in XWPFWordExtractor.getText()
Product: POI Reporter: Ben Best <ben+poi>
Component: XWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.10-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Word document with 3 content controls

Description Ben Best 2014-01-07 16:38:30 UTC
Created attachment 31177 [details]
Word document with 3 content controls

When calling getText() the contents of the content controls is not returned when the content control is within a paragraph with other text.

When the content control is the only item then the text is there.

This appears to be the exact opposite of the behaviour in 3.9 where text in a content control where that is the only item in a paragraph doesn't appear though that in a paragraph with other text does. (That fix appears to have been in the onDocumentRead() method of org.apache.poi.xwpf.XWPFDocument).

I've used the following test (and attached document to demonstrate the problem.


	public void test_manualDoc() throws FileNotFoundException, IOException  {
		String filepath = "resources/contentcontrol.docx";
		String expected = "Content control within a paragraph is here text content from within a paragraph second control with a new\nline\n\nContent control that is the entire paragraph";

		XWPFDocument doc = new XWPFDocument(new FileInputStream(filepath));
		XWPFWordExtractor extractedDoc = new XWPFWordExtractor(doc);

		String actual = extractedDoc.getText();
		
		extractedDoc.close();
		Assert.assertEquals(expected, actual);

	}
Comment 1 Dominik Stadler 2020-03-28 09:27:24 UTC
Fixed in r1875802 by including runs of type XWPFSDT during text-extraction.