Bug 61268 - NegativeArraySizeException on doc file picture
Summary: NegativeArraySizeException on doc file picture
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.16-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-09 19:24 UTC by gaurav.chd3
Modified: 2017-07-09 22:35 UTC (History)
0 users



Attachments
2014 doc file (279.15 KB, text/plain)
2017-07-09 20:40 UTC, gaurav.chd3
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gaurav.chd3 2017-07-09 19:24:44 UTC
Parse Failed for doc file
Comment 1 PJ Fanning 2017-07-09 19:30:27 UTC
gaurav.chd3@gmail.com - can you provide some context on why Apache POI support all these files?
It seems to me that if you want to read these very old files, you should use MS Word to convert them to newer formats.
Apache POI is a volunteer project and if this support matters to you or your organisation, maybe you can provide patches.
Comment 2 gaurav.chd3 2017-07-09 20:13:43 UTC
Thanks for response! 

This is a new file 2015 file not an old file. 

I am just testing it to see if it can be used in comparison to 
Other options.

Have a good day ahead!
Comment 3 Javen O'Neal 2017-07-09 20:19:36 UTC
Missing attachment, missing error message, missing reproducible test case, missing other helpful information such as POI version.

If you have a set of Microsoft Office files that can't be read, please do some investigation on your end, submit one and only one file for a given issue, and suggest an improvement in the form of a patch for POI to be able to read said file.
Comment 4 gaurav.chd3 2017-07-09 20:31:22 UTC
Sorry, for inconvenience. The file is attached now. The test cases 61265, 61267, 61266, and 61268 are completely different test cases/issues. They will have different root causes and resolutions. 

Point regarding improvement suggestion is noted. Thanks!
Comment 5 gaurav.chd3 2017-07-09 20:40:35 UTC
Created attachment 35107 [details]
2014 doc file

File size is 6 MB. It can be downloaded from below link:

http://www.3gpp.org/ftp/tsg_sa/WG3_Security/TSGS3_76_Sophia/Docs/S3-142235.zip

"S3-142235 Comments on S3-142030 VF proposal TR 33969-071_rm.doc" file in the zip file
Comment 6 PJ Fanning 2017-07-09 21:14:40 UTC
POI 3.16 / Tika 1.15

S3-142235/S3-142235 Comments on S3-142030 VF proposal TR 33969-071_rm.doc

Caused by: java.lang.NegativeArraySizeException
	at org.apache.poi.ddf.UnknownEscherRecord.fillFields(UnknownEscherRecord.java:71)
	at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:81)
	at org.apache.poi.hwpf.model.PICFAndOfficeArtData.<init>(PICFAndOfficeArtData.java:61)
	at org.apache.poi.hwpf.usermodel.Picture.<init>(Picture.java:112)
	at org.apache.poi.hwpf.model.PicturesTable.extractPicture(PicturesTable.java:162)
	at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:233)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:710)
Comment 7 PJ Fanning 2017-07-09 22:35:31 UTC
I added a workaround in https://svn.apache.org/viewvc?view=revision&revision=1801395