Bug 64561 - [PATCH] XWPFSDTContent.getText() is empty for nested SDT elements
Summary: [PATCH] XWPFSDTContent.getText() is empty for nested SDT elements
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-25 14:32 UTC by Christian Sternagel
Modified: 2020-06-26 12:32 UTC (History)
1 user (show)



Attachments
patch produced by SVN diff (1.54 KB, patch)
2020-06-25 14:32 UTC, Christian Sternagel
Details | Diff
patch (produced by SVN diff) including testcase (2.38 KB, patch)
2020-06-26 07:26 UTC, Christian Sternagel
Details | Diff
minimal example with nested SDTs (17.22 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-06-26 07:27 UTC, Christian Sternagel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Sternagel 2020-06-25 14:32:31 UTC
Created attachment 37331 [details]
patch produced by SVN diff

The current implementation of XWPFSDTContent, more specifically its constructor

        public XWPFSDTContent(CTSdtContentRun sdtRun, IBody part, IRunBody parent)

uses sdtRun.getRArray() to obtain its body elements. Thus only runs (class CTR) are considered as body elements.

However, I am currently working on a project where most documents come with a level-2 nesting of <w:sdt> Tags (apparently this is required in order to manage lists of custom controls).

Furthermore, we create a search index for our *.docx files based on getText(), which currently misses the parts that are stored in nested <w:sdt> Tags.

The attached patch is a minor modification that would improve this situation by considering also CTSdtRuns (in addition to CTRs) when collecting the body elements of an XWPFSDTContent.
Comment 1 PJ Fanning 2020-06-25 14:45:16 UTC
thanks but could you provide test coverage, eg a file that demos the issue and a unit test that checks the results when using that file?
Comment 2 Christian Sternagel 2020-06-26 07:26:03 UTC
Created attachment 37332 [details]
patch (produced by SVN diff) including testcase

I modified the patch to also include a testcase. Since I was not sure whether to include the required *.docx file in the patch, I will just add it as a separate attachment.
Comment 3 Christian Sternagel 2020-06-26 07:27:40 UTC
Created attachment 37333 [details]
minimal example with nested SDTs
Comment 4 PJ Fanning 2020-06-26 11:26:44 UTC
thanks - committed with https://svn.apache.org/repos/asf/poi/trunk@1879223 - with 1 bug (on my part) that was later fixed
Comment 5 Christian Sternagel 2020-06-26 12:32:21 UTC
Great! Thank you.