Created attachment 35793 [details] Stack trace from zip bomb failure I updated from Poi 3.9 to 3.16. My customers are now reporting issues with inability to read Excel file due to the zip bomb fix. In all instances, our software is able to open the file (new XSSFReader(OPCPackage.open(file))) and read the sheets (XSSFReader.getSheetsData.asInstanceOf[XSSFReader.SheetIterator]) without receiving an exception. The "Zip Bomb" exception appears to occur when we attempt to call XSSFReader.getStylesTable(). The ultimate error message: >java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit. Counter: 3277797, cis.counter: 32768, ratio: 0.009996958322922377 Limits: MIN_INFLATE_RATIO: 0.01 Options: 1. Fail faster? It would be beneficial to fail with the Zip Bomb error when instantiating the XSSFReader through a validation step rather than waiting until we try to access some of the embedded data? This is certainly an unexpected issue since our upgrade to 3.16. 2. Round the calculated compression ratio In every customer-reported situation we've identified the following case: ratio: 0.00999.... Limits: MIN_INFLATE_RATIO: 0.01 Should we be rounding the calculated compression ratio (0.0099...) to 2 decimal places? Would that reduce the security of this? Exception: attached
I wonder where the "duplicated" styles come from, i.e. there must be a reason why the compression ratio is so high - are the styles in that excel file generated? ... and maybe a new style is created for each cell? at option 1.: I don't think that eagerly fetching shared strings and styles table is a good idea - I actually would prefer the opposite ... something like a lazy-loading mechanism inside of the content, e.g. a table which successively fills when it's elements are iterated over at option 2. I don't get the rounding advantage - how about setting the limit to 0.005?
1. This would cause performance issues as we would need to read and parse every embedded item, for some files this could be substantial. Probably best if you use a small wrapper function where you do this if it is useful for you. 2. We continually compute the compression ratio while uncompressing the stream and stop as soon as the ratio goes beyond the limit. So you always will see a number close to the limit here. We do not read the whole stream and thus do not compute the actual ratio of the whole file before failing as it is the whole point of this to not read malicious files completely. We can try to improve the error message to make this a bit clearer.