Bug 60345

Summary: Handle corrupt PICT streams
Product: POI Reporter: Andreas Beeker <kiwiwings>
Component: HSLFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.16-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: gracefully handle truncated pict streams

Description Andreas Beeker 2016-11-06 00:56:53 UTC
Created attachment 34422 [details]
gracefully handle truncated pict streams

Based on the example files in TIKA-2157 and TIKA-2130, I've tried to workaround the truncated PICT streams, by first read as much as possible before the deflated streams go corrupt and then when rendering (via PPTX2PNG) with the help of the twelvemonkeys library, draw those images up to the truncation.

As this was only tested with the twelvemonkeys lib and I haven't tested, if this would also solve TIKAs problem, I'm  providing the patch for further discussion.
Comment 1 Yegor Kozlov 2016-11-06 13:10:55 UTC
The patch does solve TIKA's problems with extracting image/pict files. I checked 5 files mentioned in TIKA-2164,TIKA-2157 and TIKA-2130 and all of them passed, at least, extraction of images went without hiccups.  
+1 to check it in svn. It also makes sense to include a couple of samples in our collection of test files.
Comment 2 Andreas Beeker 2016-11-10 23:05:34 UTC
Patch applied via r1769226
Comment 3 Tim Allison 2016-11-11 01:16:33 UTC
Thank you, Andi!