Bug 58760 - Unable to read xlsx file (open OPCPackage) after upgrading to version 3.13 from 3.10.1
Summary: Unable to read xlsx file (open OPCPackage) after upgrading to version 3.13 fr...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.13-FINAL
Hardware: All All
: P2 blocker (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-22 13:02 UTC by shahar
Modified: 2015-12-31 20:28 UTC (History)
0 users



Attachments
An empty xlsx file that is working with poi 3.10.1 but not with 3.13 (7.23 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-12-22 13:23 UTC, shahar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description shahar 2015-12-22 13:02:07 UTC
This is basically the code i'm running to open the woorkbook:

try (WriterOutputStream baos = new WriterOutputStream(output)) {

			try (PrintStream ps = new PrintStream(baos)) {

				// The package open is instantaneous, as it should be.
				try (OPCPackage p = OPCPackage.open(file, PackageAccess.READ)) {

and getting:

Exception in thread "main" org.apache.poi.openxml4j.exceptions.InvalidFormatException: The part /xl/sharedStrings.xml does not have any content type ! Rule: Package require content types when retrieving a part from a package. [M.1.14]
	at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:247)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:684)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:254)

This code and file were working fine with poi version 3.10.1
Comment 1 Nick Burch 2015-12-22 13:03:10 UTC
Can you please attach a sample problematic file?
Comment 2 shahar 2015-12-22 13:07:52 UTC
The file contains sensitive data which i can't expose, the problem is when i open the file in MS excel and save it after some changes, it will fix the problem (excel save action changes the file inner xml files).
Can you offer maybe a different way to change the file's content without changing its format?
Comment 3 Nick Burch 2015-12-22 13:10:07 UTC
(In reply to shahar from comment #2)
> The file contains sensitive data which i can't expose, the problem is when i
> open the file in MS excel and save it after some changes, it will fix the
> problem (excel save action changes the file inner xml files).
> Can you offer maybe a different way to change the file's content without
> changing its format?

http://poi.apache.org/faq.html#faq-N1016C
Comment 4 shahar 2015-12-22 13:23:00 UTC
Created attachment 33367 [details]
An empty xlsx file that is working with poi 3.10.1 but not with 3.13
Comment 5 shahar 2015-12-22 13:28:07 UTC
Hi,
I've attached a file with no content that i can open in java code with poi 3.10.1, but with 3.13 i get the above exception.
Comment 6 Dominik Stadler 2015-12-23 13:25:27 UTC
The cause of the parsing problem is that the XML-files inside the .xlsx file use a strange namespace

Your file has:

<ns0:Types xmlns:ns0="http://schemas.openxmlformats.org/package/2006/content-types">
  <ns0:Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme1.xml" />
  <ns0:Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml" />

whereas usually files use the default namespace as follows:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
	<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
	<Default Extension="xml" ContentType="application/xml"/>
	<Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>


Note the "ns0:" prefixes. It seems the parsing in POI does not take the namespace into account here the way that Excel does.
Comment 7 shahar 2015-12-23 13:28:31 UTC
The thing is that the old poi (3.10.1) did manage to read this xlsx file (so it's kind of BWC break for us).
Is there a workaround i can do to make it read this file?

Thanks
Comment 8 Nick Burch 2015-12-23 14:46:03 UTC
IIRC we switched away from DOM4j in 3.11 to the JVM built-in XML parser, I wonder if we lost the namespace stuff in the process?
Comment 9 Dominik Stadler 2015-12-23 15:27:21 UTC
The only workaround without a code-fix that I can think of is to get the Excel file in a format similar to what POI expects currently, i.e. with default-namespace instead of the ns0: one that you have in your file.
Comment 10 shahar 2015-12-23 15:32:06 UTC
We get those files from external source, which is probably generated automatically, we don't have the control of how those files are being generated unfortunately.
Comment 11 Nick Burch 2015-12-29 22:43:20 UTC
Thanks for the test file, and for working with us on this

In r1722244, I've added a disabled unit test that uses your file to show up the problem

Now it just needs someone to work out what we're doing wrong in the new secure XML parsing setup, set the appropriate option to fix it, then enable the test...
Comment 12 Dominik Stadler 2015-12-30 07:30:16 UTC
I have a potential patch which I will post here shortly for review, it enables namespaces when parsing some of the files to allow to read this file.
Comment 13 Dominik Stadler 2015-12-31 10:01:06 UTC
I have hopefully fixed this via r1722433, we now use methods to take namespaces into account when reading XML files from within the .xlsx file and thus should handle such differently built files as well. Please give the upcoming nightly build (https://builds.apache.org/view/POI/job/POI/lastSuccessfulBuild/) a try to make sure it is fixed for you as well.
Comment 14 shahar 2015-12-31 10:06:08 UTC
Thank you very much!
I will test it, in which version it supposed to be released?
Comment 15 Dominik Stadler 2015-12-31 20:28:04 UTC
It will be contained in the next release which will be either 3.14-beta2 or 3.14-rc1, whichever comes first. Alternatively you can take the binaries produced by the CI builds at https://builds.apache.org/view/POI/job/POI/lastSuccessfulBuild/