Bug 64132 - Regression in handling UnknownEscherRecord when getting pictures in .doc files
Summary: Regression in handling UnknownEscherRecord when getting pictures in .doc files
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 regression (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 64036
Blocks:
  Show dependency tree
 
Reported: 2020-02-10 18:02 UTC by Tim Allison
Modified: 2021-01-04 05:51 UTC (History)
0 users



Attachments
Triggering file (231.50 KB, application/msword)
2020-02-10 18:02 UTC, Tim Allison
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2020-02-10 18:02:57 UTC
Created attachment 36999 [details]
Triggering file

In our recent regression tests, there were six files that had new exceptions with the stacktrace below.

Not sure of best way to handle this...?

java.lang.ClassCastException: class org.apache.poi.ddf.UnknownEscherRecord cannot be cast to class org.apache.poi.ddf.EscherBlipRecord (org.apache.poi.ddf.UnknownEscherRecord and org.apache.poi.ddf.EscherBlipRecord are in unnamed module of loader 'app')
	at org.apache.poi.ddf.EscherBSERecord.fillFields(EscherBSERecord.java:100)
	at org.apache.poi.hwpf.model.PICFAndOfficeArtData.<init>(PICFAndOfficeArtData.java:78)
	at org.apache.poi.hwpf.usermodel.Picture.<init>(Picture.java:112)
	at org.apache.poi.hwpf.model.PicturesTable.extractPicture(PicturesTable.java:162)
	at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:233)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:654)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:644)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:173)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
Comment 1 Dominik Stadler 2021-01-03 21:32:18 UTC
Running "git bisect" identifies change r1873187 being related here.
Comment 2 Dominik Stadler 2021-01-04 05:51:53 UTC
Fixed via r1885092, these documents should be parsed as before again now.