Bug 61300 - Very slow processing on corrupted file
Summary: Very slow processing on corrupted file
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.17-dev
Hardware: PC All
: P2 minor (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2017-07-14 12:37 UTC by Tim Allison
Modified: 2017-09-19 12:13 UTC (History)
0 users

triggering file (60.50 KB, application/x-ole-storage)
2017-07-14 12:37 UTC, Tim Allison

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2017-07-14 12:37:05 UTC
Created attachment 35141 [details]
triggering file

I need to figure out if this is a POIFs bug or a parseSummaries bug.  This is triggered by a corrupted file.

At this location:
	  at org.apache.poi.util.IOUtils.copy(IOUtils.java:296)
	  at org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:64)
	  at org.apache.poi.hpsf.PropertySet.isPropertySetStream(PropertySet.java:393)
	  at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:191)
	  at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)
	  at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)

        while((count = inp.read(buff)) != -1) {
            if(count > 0) {
                out.write(buff, 0, count);

On the first iteration, the pos in inp is 0, but then the pos goes negative on each iteration, and this loop iterates for a very long time.

The source file that I corrupted is: testEXCEL_embeddedPDF_windows.xls
Comment 1 Dominik Stadler 2017-08-20 09:40:31 UTC
How can we reproduce this with POI alone? How is the document opened in Tika?
Comment 2 Tim Allison 2017-09-19 12:13:51 UTC
  I'm sorry for never responding.  Y, looks like I could reproduce this in pure POI.

fixed r1801989