Bug 53810

Summary: [PATCH] fix for incorrect loop detection in NPOIFS
Product: POI Reporter: Gary King <gking>
Component: POIFSAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.4   
Attachments: patch fixing cycle detection in NPOI

Description Gary King 2012-09-01 00:19:28 UTC
While upgrading our application to use Tika 1.2 (previously Tika 0.9), a few PowerPoint 97-03 (PPT) files which previously parsed correctly started failing with exceptions in NPOIFS.

The root cause appears to be a difference in the way that BAT entries are read from XBAT blocks between POIFSFileSystem and NPOIFSFileSystem. In POIFS, the header's getBATCount is used as a hard-limit for the number of BATs which are read; in NPOIFS, XBATEntriesPerBlock are read for every XBAT, even if this causes more total BAT entries to be read than header.getBATCount. In some files, the extraneous BAT blocks are all initialized to the same value, which is then detected as a possible cycle.

The attached PPT file demonstrates this problem (it was found via a web-crawler search for test content, so I can not grant a license to Apache to redistribute it). The attached patch implements similar behavior in NPOIFS to what exists in POIFS, and allows the file to parse without exception.
Comment 1 Gary King 2012-09-01 00:21:58 UTC
Created attachment 29315 [details]
patch fixing cycle detection in NPOI
Comment 2 Gary King 2012-09-01 00:34:09 UTC
Bugzilla isn't letting me upload the file; however, the file may be downloaded from http://www.slideshare.net/jbrenman/thirst.
Comment 3 Nick Burch 2013-02-04 12:52:56 UTC
Thanks for this, slightly modified version committed in r1442095. With that in place, I can now process that slideshare file without problems.