Bug 54937 - Strange author table structures in word documents failing the text extraction entirely.
Summary: Strange author table structures in word documents failing the text extraction...
Status: RESOLVED DUPLICATE of bug 56880
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-08 13:27 UTC by Shu Yang
Modified: 2016-01-19 17:59 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Shu Yang 2013-05-08 13:27:31 UTC
Here's the stack trace of the exception.

Caused by: java.lang.UnsupportedOperationException: Non-extended character Pascal strings are not supported right now. Please, contact POI developers for update.
at org.apache.poi.hwpf.model.SttbUtils.read(SttbUtils.java:66)
at org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:116)
at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:360)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:80)

This happens in Tika 1.3

Thanks!
Comment 1 Tim Allison 2016-01-19 17:59:15 UTC

*** This bug has been marked as a duplicate of bug 56880 ***