Summary: | Handle Glossary in XWPFDocument | ||
---|---|---|---|
Product: | POI | Reporter: | Tim Allison <tallison> |
Component: | XWPF | Assignee: | POI Developers List <dev> |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.16-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows NT |
Description
Tim Allison
2016-10-28 16:24:19 UTC
On further review, and given TIKA-2163, it looks like this is a whole new kettle of worms. The proposed fix is incorrect duct tape over a far larger issue. We aren't currently handling the glossaryDocument as a special relationship type. Anyone have experience with glossaryDocument? Looks like an entire other document stored within the document... Does anyone have a recommendation for a more graceful outcome than a ClassCastException for files with a GlossaryDocument? I suspect the actual fix will take a nontrivial amount of work. I don’t want to hide/forget the issue, but I also would prefer a different outcome...logging perhaps? This issue was recently raised on https://issues.apache.org/jira/browse/TIKA-2769 via an elasticsearch issue. Our current workaround on Tika is to recommend the SAX based docx parser. I would opt for more gracefully handling this, just because POI does not support a feature it would be nice if it still can handle the document to some degree, so a log would probably be more appropriate for now. Thank you, Dominik. Unless there are objections, I'll try to add logging as a first step. I'll leave this ticket open for when someone has time to add the new capability. In r1845517, I added a check+log+skip to avoid a ClassCastException until we have time to implement correct handling of a glossary document. I shouldn't have skipped "template" types. I should have skipped "glossary" types. This leads to a regression where headers/footers are not extracted from template documents. Will commit fix and new unit test once local build/test/test-integration completes successfully. |