Bug 66263 - Add support for SDT row in tables
Summary: Add support for SDT row in tables
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2022-09-14 15:21 UTC by Tim Allison
Modified: 2022-12-26 09:45 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2022-09-14 15:21:02 UTC
On https://issues.apache.org/jira/browse/TIKA-3816, Jason Guo posted an example file where Tika is not extracting text from a table element.

The issue is that there's an SDT at the same level as the table row.  It looks like we can get a list of sdts from the table object in the underlying bean.

We should probably add a wrapper for these underlying bits?  At the very least, we should add the example file so that we load the sdtrow into our beans in the general ooxml-schemas.
Comment 1 PJ Fanning 2022-09-14 16:38:55 UTC
I added r1904079 but more work needs to be done to get the XWPFWordExtractor to CTSdtRow content.
Comment 2 PJ Fanning 2022-09-14 19:35:55 UTC
I also added r1904081 

This has test code that walks the Std xml objects so more classes will end up in poi-ooxml-lite (that jar is populated with poi-ooxml-full classes that are needed for our unit tests to run.

I don't really use the XWPF code so don't really want to spend more time on this. Seems to me that XWPFTable needs to be extended so that it looks at the Std data as well as the CTRow and related cell data.