Bug 63188 - Specific microsoft excel file(.xlsx) getting corrupted while manipulated using apache-poi-3.10 libraries
Summary: Specific microsoft excel file(.xlsx) getting corrupted while manipulated usin...
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-18 14:48 UTC by Sushmita Nag
Modified: 2019-03-10 09:53 UTC (History)
0 users



Attachments
This file is for reproducing the second part of the bug. Where the excel file import/manipulation fails using poi-3.15 (66 bytes, text/plain)
2019-02-20 19:21 UTC, Sushmita Nag
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sushmita Nag 2019-02-18 14:48:07 UTC
Specific microsoft excel files(.xlsx) getting corrupted while manipulated using apache-poi-3.10 libraries. Could you please let us know the root cause of this issue ? 

We have found a workaround for this, where in we are upgrading poi libraries to higher version(3.15). And the issue is getting resolved and we are able to open the excel file successfully.

However, on upgrading to 3.15, some other specific excel files are not getting imported/manipulated itself. POI is throwing certain exceptions. I am mentioning the stacktrace below.

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.0.36]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.poi.openxml4j.exceptions.OpenXML4JRuntimeException: Fail to save: an error occurs while saving the package : The part /xl/pivotCache/pivotCacheRecords1.xml fail to be saved in the stream with marshaller org.apache.poi.openxml4j.opc.internal.marshallers.DefaultMarshaller@2d5fbd7e
	at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:602) [poi-ooxml-3.15.jar:3.15]
	at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1557) [poi-ooxml-3.15.jar:3.15]
	at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1542) [poi-ooxml-3.15.jar:3.15]
	at com.emc.o2.api.poi.POIUtil.modify2007(POIUtil.java:613) [O2-API.jar:na]
	at com.emc.o2.api.config.modules.attribute.O2Processor.applyDctmToOfficeEx(O2Processor.java:539) [O2-API.jar:na]




Could you please help us resolving the above issue ? Waiting for your reply. Thanks in Advance.


Regards,
Sushmita
Comment 1 Yegor Kozlov 2019-02-18 15:25:40 UTC
Can you please upload a sample file and code to reproduce the issue?
Comment 2 Sushmita Nag 2019-02-20 19:21:53 UTC
Created attachment 36445 [details]
This file is for reproducing the second part of the bug. Where the excel file import/manipulation fails using poi-3.15

This file is for reproducing the second part of the bug. Where the excel file import/manipulation fails using poi-3.15

https://drive.google.com/open?id=1nuoW96ZdqpsG4WgqLnAVdx45kpE5AWgy
Comment 3 Sushmita Nag 2019-02-20 19:29:19 UTC
Latest link :-

https://drive.google.com/open?id=1seYe8W75wM8LWJ4-xUFSoDcpDpnofxkG
Comment 4 Dominik Stadler 2019-02-23 17:49:42 UTC
The file somehow triggers some security-related safeguards in the XML-Handling. If I run this with the latest version, the following is logged out if logging is turned on:


java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
Uncompressed size: 819534, Raw/compressed size: 8192, ratio: 0.009996
Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/pivotCache/pivotCacheRecords1.xml


You can disable this check with the following, please try and report back here if it made it work.

ZipSecureFile.setMinInflateRatio(0.0);

BTW. There is a newer version 3.17 in the 3-series which contains many fixes on top of 3.15. Also the latest release is 4.0.1, if possible we suggest to upgrade to the latest version to get all new features/bugfixes and support for current technologies.
Comment 5 Sushmita Nag 2019-02-28 07:07:41 UTC
hi Dominik Stadler,

As per your suggestion, the fix which you suggested[setting minm inflate ratio explicitly] worked and we are able to import/manipulate the .xlsx successfully using poi-3.15.

However, as per poi, the file is used to inflate memory usage and thus could pose a security risk. 

So, explicitly disabling the validation & setting minimum inflate ratio to 0.0 - Will it expose any security risk from server safety perspective ? because then there are chances where certain files can blow up the server due to excessive memory usage. So, we are concerned about this factor.

Could you please suggest us on the same ?


Regards,
Sushmita
Comment 6 Dominik Stadler 2019-03-10 09:53:59 UTC
If you set minimum inflation ratio to 0, you disable the protection that we built into Apache POI.

However this only poses a security threat if you process documents where you do not control the contents fully, e.g. if you allow users to upload documents that are then processed.

If you do not allow that anywhere, you might be fine with setting it to 0.

If you allow external uploads of documents, but you would like to process documents like the one provided, you can try using a different value for minimum inflation ratio. The default is 0.01, so you might need to experiment with smaller values, e.g. 0.001 until you can process documents, but still have some protection against document which expand too much and would use up too much memory.

As this is working as expected from our point of view, I am closing this for now, please discuss on the mailing list if you have more usage questions or report new bugs if you find something not working as expected/described.