Bug 56957

Summary: Excel 2007 file is unusable after closing Workbook object
Product: POI Reporter: Armen Vardanyan <vardarmo>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: blocker CC: pal.ratikanta, rober_20_02, vardarmo
Priority: P1    
Version: 3.11-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: excel files and java source file

Description Armen Vardanyan 2014-09-11 07:39:03 UTC
Created attachment 31999 [details]
excel files and java source file

I am reading an Excel 2007 file(xlsx format). After the first read, when I close the Workbook object, the second time I run the Java application, I get exception in Eclipse.

I am using the latest release of POI available in maven repositories, i.e. 3.11 beta2. Tested on both Windows and Linux, Java 7 and Java 8, the problem persists in all cases

This happens ONLY on Microsoft Excel 2007 XLSX files. It does not happen when using Microsoft Excel 2013 XLSX files.

///////////////////////////////////////////////////////////////////
The error I am getting in Eclipse console is the following:

Exception in thread "main" org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
	at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:62)
	at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:427)
	at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:162)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:236)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:109)
	at Main.main(Main.java:17)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:60)
	... 5 more
Caused by: java.io.IOException: error: Unexpected end of file after null
	at org.apache.poi.xssf.model.SharedStringsTable.readFrom(SharedStringsTable.java:129)
	at org.apache.poi.xssf.model.SharedStringsTable.<init>(SharedStringsTable.java:106)
	... 10 more
//////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////
The error I am getting from Microsoft Excel 2007 when opening the file after this corruption, is the following:
"Excel found unreadable content in 'file_name.xlsx'. Do you want to recover the contents of this workbook?"
When answering yes, it says:
"Excel was able to open the file by repairing or removing the unreadable content. Removed Part: /xl/sharedStrings.xml part with XML error.  (Strings) A document must contain exactly one root element. Line 1, column 0."
///////////////////////////////////////////////////////////////////////////
I also attach
1. the Java source file I used to read the file(Main.java), and 
2. the Excel 2007 XLSX file before corruption(Excel_2007_file_before.xlsx) and after corruption(Excel_2007_file_after.xlsx)
The files are in 'all.zip' file
Comment 1 Nick Burch 2014-09-11 08:30:42 UTC
I can reproduce the problem with your small test file

No idea why it's happening, hopefully someone else can investigate...
Comment 2 Dominik Stadler 2014-10-12 19:35:18 UTC
The problem is that the original xlsx does not contain a file for the SharedString table in the zip-file and thus an empty string table is created during loading the file initially.

During close(), the file is written back and a 0-byte sharedStrings.xml file is created, which later fails during loading the xlsx again.

I tried it with the following change, which makes this test work, however I am not sure if this is the correct way to exclude parts which do not have any size:

--- a/src/ooxml/java/org/apache/poi/openxml4j/opc/internal/marshallers/ZipPartMarshaller.java
+++ b/src/ooxml/java/org/apache/poi/openxml4j/opc/internal/marshallers/ZipPartMarshaller.java
@@ -63,6 +63,11 @@ public final class ZipPartMarshaller implements PartMarshaller {
                        // Normally should happen only in developement phase, so just throw
                        // exception
                }
+
+               // check if there is anything to save
+               if(part.getSize() == 0) {
+                   return true;
+               }
 
                ZipOutputStream zos = (ZipOutputStream) os;
                ZipEntry partEntry = new ZipEntry(ZipHelper
Comment 3 RatiKanata Pal 2015-01-08 08:11:18 UTC
What xbean version you are using?
Comment 4 Robertiano 2015-02-05 10:52:48 UTC
I can reproduce the problem in 3.11 version with xmlbeans 2.6.0.
Comment 5 Dominik Stadler 2015-03-01 20:42:19 UTC
BTW, a possible workaround if you are just reading from the file is to open the file in "read-only" mode, then the problem does not happen:

Workbook workbook = WorkbookFactory.create(OPCPackage.open(file, PackageAccess.READ));
Comment 6 Robertiano 2015-03-02 12:04:32 UTC
(In reply to Dominik Stadler from comment #5)
> BTW, a possible workaround if you are just reading from the file is to open
> the file in "read-only" mode, then the problem does not happen:
> 
> Workbook workbook = WorkbookFactory.create(OPCPackage.open(file,
> PackageAccess.READ));

I get below exception when I open empty file in read-noly mode:

org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Operation not allowed, document open in read only mode!
	at org.apache.poi.POIXMLDocumentPart.createRelationship(POIXMLDocumentPart.java:394)
	at org.apache.poi.POIXMLDocumentPart.createRelationship(POIXMLDocumentPart.java:354)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:341)
	at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:166)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:240)
	at com.X.Y.db.process.SourceProcessorA.process(SourceProcessorA.java:42)
	at com.X.Y.db.process.SourceProcessorATest.testProcessEmptySource(SourceProcessorATest.java:59)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Operation not allowed, document open in read only mode!
	at org.apache.poi.openxml4j.opc.OPCPackage.throwExceptionIfReadOnly(OPCPackage.java:512)
	at org.apache.poi.openxml4j.opc.OPCPackage.createPart(OPCPackage.java:773)
	at org.apache.poi.openxml4j.opc.OPCPackage.createPart(OPCPackage.java:749)
	at org.apache.poi.POIXMLDocumentPart.createRelationship(POIXMLDocumentPart.java:374)
Comment 7 Nick Burch 2015-03-02 12:49:21 UTC
Can you try with POI 3.12 beta 1? There was a read-only fix in that
Comment 8 Robertiano 2015-03-03 09:29:32 UTC
(In reply to Nick Burch from comment #7)
> Can you try with POI 3.12 beta 1? There was a read-only fix in that

Works (if you don't need write)!!

Update to POI 3.12 beta 1 and you open files in read only mode.
Comment 9 Dominik Stadler 2015-10-26 07:56:30 UTC
This is now fixed for empty shared string tables via r1710521, I put in a check to only avoid writing the XML file for SharedStringTable for now as doing it for all types of documents likely introduced trouble with existing code and broke unit tests. 

Please report new bugs if there are any other XML-parts that cause trouble if written empty.