Bug 62201 - Zip Bomb ratio: Fail fast and/or round the ratio before comparison
Summary: Zip Bomb ratio: Fail fast and/or round the ratio before comparison
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-20 21:51 UTC by William Stephens
Modified: 2018-03-21 06:16 UTC (History)
0 users



Attachments
Stack trace from zip bomb failure (2.34 KB, text/plain)
2018-03-20 21:51 UTC, William Stephens
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Stephens 2018-03-20 21:51:47 UTC
Created attachment 35793 [details]
Stack trace from zip bomb failure

I updated from Poi 3.9 to 3.16. My customers are now reporting issues with inability to read Excel file due to the zip bomb fix. In all instances, our software is able to open the file (new XSSFReader(OPCPackage.open(file))) and read the sheets (XSSFReader.getSheetsData.asInstanceOf[XSSFReader.SheetIterator]) without receiving an exception. 

The "Zip Bomb" exception appears to occur when we attempt to call XSSFReader.getStylesTable(). 

The ultimate error message: 
>java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit. Counter: 3277797, cis.counter: 32768, ratio: 0.009996958322922377 Limits: MIN_INFLATE_RATIO: 0.01

Options: 
1. Fail faster? 
It would be beneficial to fail with the Zip Bomb error when instantiating the XSSFReader through a validation step rather than waiting until we try to access some of the embedded data? This is certainly an unexpected issue since our upgrade to 3.16.

2. Round the calculated compression ratio
In every customer-reported situation we've identified the following case:
ratio: 0.00999.... Limits: MIN_INFLATE_RATIO: 0.01 
Should we be rounding the calculated compression ratio (0.0099...) to 2 decimal places? Would that reduce the security of this? 

Exception: attached
Comment 1 Andreas Beeker 2018-03-20 22:13:12 UTC
I wonder where the "duplicated" styles come from, i.e. there must be a reason why the compression ratio is so high - are the styles in that excel file generated? ... and maybe a new style is created for each cell?

at option 1.: I don't think that eagerly fetching shared strings and styles table is a good idea - I actually would prefer the opposite ... something like a lazy-loading mechanism inside of the content, e.g. a table which successively fills when it's elements are iterated over

at option 2. I don't get the rounding advantage - how about setting the limit to 0.005?
Comment 2 Dominik Stadler 2018-03-21 06:16:32 UTC
1. This would cause performance issues as we would need to read and parse every embedded item, for some files this could be substantial. Probably best if you use a small wrapper function where you do this if it is useful for you.

2. We continually compute the compression ratio while uncompressing the stream and stop as soon as the ratio goes beyond the limit. So you always will see a number close to the limit here. We do not read the whole stream and thus do not compute the actual ratio of the whole file before failing as it is the whole point of this to not read malicious files completely.

We can try to improve the error message to make this a bit clearer.