64132 – Regression in handling UnknownEscherRecord when getting pictures in .doc files

Bug 64132 - Regression in handling UnknownEscherRecord when getting pictures in .doc files

Summary: Regression in handling UnknownEscherRecord when getting pictures in .doc files

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HWPF (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 regression (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:	64036
Blocks:
	Show dependency tree

Reported:	2020-02-10 18:02 UTC by Tim Allison
Modified:	2021-01-04 05:51 UTC (History)
CC List:	0 users

Attachments
Triggering file (231.50 KB, application/msword) 2020-02-10 18:02 UTC, Tim Allison	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tim Allison 2020-02-10 18:02:57 UTC

Created attachment 36999 [details]
Triggering file

In our recent regression tests, there were six files that had new exceptions with the stacktrace below.

Not sure of best way to handle this...?

java.lang.ClassCastException: class org.apache.poi.ddf.UnknownEscherRecord cannot be cast to class org.apache.poi.ddf.EscherBlipRecord (org.apache.poi.ddf.UnknownEscherRecord and org.apache.poi.ddf.EscherBlipRecord are in unnamed module of loader 'app')
	at org.apache.poi.ddf.EscherBSERecord.fillFields(EscherBSERecord.java:100)
	at org.apache.poi.hwpf.model.PICFAndOfficeArtData.<init>(PICFAndOfficeArtData.java:78)
	at org.apache.poi.hwpf.usermodel.Picture.<init>(Picture.java:112)
	at org.apache.poi.hwpf.model.PicturesTable.extractPicture(PicturesTable.java:162)
	at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:233)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:654)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:644)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:173)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

Comment 1 Dominik Stadler 2021-01-03 21:32:18 UTC

Running "git bisect" identifies change r1873187 being related here.

Comment 2 Dominik Stadler 2021-01-04 05:51:53 UTC

Fixed via r1885092, these documents should be parsed as before again now.