Specific microsoft excel files(.xlsx) getting corrupted while manipulated using apache-poi-3.10 libraries. Could you please let us know the root cause of this issue ? We have found a workaround for this, where in we are upgrading poi libraries to higher version(3.15). And the issue is getting resolved and we are able to open the excel file successfully. However, on upgrading to 3.15, some other specific excel files are not getting imported/manipulated itself. POI is throwing certain exceptions. I am mentioning the stacktrace below. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.0.36] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.poi.openxml4j.exceptions.OpenXML4JRuntimeException: Fail to save: an error occurs while saving the package : The part /xl/pivotCache/pivotCacheRecords1.xml fail to be saved in the stream with marshaller org.apache.poi.openxml4j.opc.internal.marshallers.DefaultMarshaller@2d5fbd7e at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:602) [poi-ooxml-3.15.jar:3.15] at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1557) [poi-ooxml-3.15.jar:3.15] at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1542) [poi-ooxml-3.15.jar:3.15] at com.emc.o2.api.poi.POIUtil.modify2007(POIUtil.java:613) [O2-API.jar:na] at com.emc.o2.api.config.modules.attribute.O2Processor.applyDctmToOfficeEx(O2Processor.java:539) [O2-API.jar:na] Could you please help us resolving the above issue ? Waiting for your reply. Thanks in Advance. Regards, Sushmita
Can you please upload a sample file and code to reproduce the issue?
Created attachment 36445 [details] This file is for reproducing the second part of the bug. Where the excel file import/manipulation fails using poi-3.15 This file is for reproducing the second part of the bug. Where the excel file import/manipulation fails using poi-3.15 https://drive.google.com/open?id=1nuoW96ZdqpsG4WgqLnAVdx45kpE5AWgy
Latest link :- https://drive.google.com/open?id=1seYe8W75wM8LWJ4-xUFSoDcpDpnofxkG
The file somehow triggers some security-related safeguards in the XML-Handling. If I run this with the latest version, the following is logged out if logging is turned on: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit. Uncompressed size: 819534, Raw/compressed size: 8192, ratio: 0.009996 Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/pivotCache/pivotCacheRecords1.xml You can disable this check with the following, please try and report back here if it made it work. ZipSecureFile.setMinInflateRatio(0.0); BTW. There is a newer version 3.17 in the 3-series which contains many fixes on top of 3.15. Also the latest release is 4.0.1, if possible we suggest to upgrade to the latest version to get all new features/bugfixes and support for current technologies.
hi Dominik Stadler, As per your suggestion, the fix which you suggested[setting minm inflate ratio explicitly] worked and we are able to import/manipulate the .xlsx successfully using poi-3.15. However, as per poi, the file is used to inflate memory usage and thus could pose a security risk. So, explicitly disabling the validation & setting minimum inflate ratio to 0.0 - Will it expose any security risk from server safety perspective ? because then there are chances where certain files can blow up the server due to excessive memory usage. So, we are concerned about this factor. Could you please suggest us on the same ? Regards, Sushmita
If you set minimum inflation ratio to 0, you disable the protection that we built into Apache POI. However this only poses a security threat if you process documents where you do not control the contents fully, e.g. if you allow users to upload documents that are then processed. If you do not allow that anywhere, you might be fine with setting it to 0. If you allow external uploads of documents, but you would like to process documents like the one provided, you can try using a different value for minimum inflation ratio. The default is 0.01, so you might need to experiment with smaller values, e.g. 0.001 until you can process documents, but still have some protection against document which expand too much and would use up too much memory. As this is working as expected from our point of view, I am closing this for now, please discuss on the mailing list if you have more usage questions or report new bugs if you find something not working as expected/described.