61268 – NegativeArraySizeException on doc file picture

Bug 61268 - NegativeArraySizeException on doc file picture

Summary: NegativeArraySizeException on doc file picture

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	POI Overall (show other bugs)
Version:	3.16-dev
Hardware:	PC All

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-07-09 19:24 UTC by gaurav.chd3
Modified:	2017-07-09 22:35 UTC (History)
CC List:	0 users

Attachments
2014 doc file (279.15 KB, text/plain) 2017-07-09 20:40 UTC, gaurav.chd3	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description gaurav.chd3 2017-07-09 19:24:44 UTC

Parse Failed for doc file

Comment 1 PJ Fanning 2017-07-09 19:30:27 UTC

gaurav.chd3@gmail.com - can you provide some context on why Apache POI support all these files?
It seems to me that if you want to read these very old files, you should use MS Word to convert them to newer formats.
Apache POI is a volunteer project and if this support matters to you or your organisation, maybe you can provide patches.

Comment 2 gaurav.chd3 2017-07-09 20:13:43 UTC

Thanks for response! 

This is a new file 2015 file not an old file. 

I am just testing it to see if it can be used in comparison to 
Other options.

Have a good day ahead!

Comment 3 Javen O'Neal 2017-07-09 20:19:36 UTC

Missing attachment, missing error message, missing reproducible test case, missing other helpful information such as POI version.

If you have a set of Microsoft Office files that can't be read, please do some investigation on your end, submit one and only one file for a given issue, and suggest an improvement in the form of a patch for POI to be able to read said file.

Comment 4 gaurav.chd3 2017-07-09 20:31:22 UTC

Sorry, for inconvenience. The file is attached now. The test cases 61265, 61267, 61266, and 61268 are completely different test cases/issues. They will have different root causes and resolutions. 

Point regarding improvement suggestion is noted. Thanks!

Comment 5 gaurav.chd3 2017-07-09 20:40:35 UTC

Created attachment 35107 [details]
2014 doc file

File size is 6 MB. It can be downloaded from below link:

http://www.3gpp.org/ftp/tsg_sa/WG3_Security/TSGS3_76_Sophia/Docs/S3-142235.zip

"S3-142235 Comments on S3-142030 VF proposal TR 33969-071_rm.doc" file in the zip file

Comment 6 PJ Fanning 2017-07-09 21:14:40 UTC

POI 3.16 / Tika 1.15

S3-142235/S3-142235 Comments on S3-142030 VF proposal TR 33969-071_rm.doc

Caused by: java.lang.NegativeArraySizeException
	at org.apache.poi.ddf.UnknownEscherRecord.fillFields(UnknownEscherRecord.java:71)
	at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:81)
	at org.apache.poi.hwpf.model.PICFAndOfficeArtData.<init>(PICFAndOfficeArtData.java:61)
	at org.apache.poi.hwpf.usermodel.Picture.<init>(Picture.java:112)
	at org.apache.poi.hwpf.model.PicturesTable.extractPicture(PicturesTable.java:162)
	at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:233)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:710)

Comment 7 PJ Fanning 2017-07-09 22:35:31 UTC

I added a workaround in https://svn.apache.org/viewvc?view=revision&revision=1801395