Bug 61275 - Excel could not open <file> because some content is unreadable
Summary: Excel could not open <file> because some content is unreadable
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.15-FINAL
Hardware: Macintosh All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-10 18:05 UTC by Carl Buxbaum
Modified: 2017-09-28 15:45 UTC (History)
1 user (show)



Attachments
corrupt xlsx file generated with Websphere IBM JVM 1.8 (6.59 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-07-10 18:05 UTC, Carl Buxbaum
Details
working xlsx file, generated with Tomcat Oracle JVM 1.8 (6.55 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-07-10 18:06 UTC, Carl Buxbaum
Details
struts action for building excel (4.85 KB, text/plain)
2017-07-10 18:09 UTC, Carl Buxbaum
Details
Helper class for building xlsx (44.04 KB, text/plain)
2017-07-10 18:09 UTC, Carl Buxbaum
Details
content diff (3.36 KB, patch)
2017-07-11 06:16 UTC, Stefan Bodewig
Details | Diff
potentially "repaired" Results_bad.xlsx (6.54 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-07-11 14:42 UTC, Stefan Bodewig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Buxbaum 2017-07-10 18:05:36 UTC
Created attachment 35110 [details]
corrupt xlsx file generated with Websphere IBM JVM 1.8

Hi, if I generate an xlsx file using a war deployed on Tomcat, I do not get an error.  When I generate the same xlsx file using a war deployed on WebSphere I get the above error.  I am attributing the difference to a difference in implementation of the zip functionality between Oracle JVM and IBM JVM (so perhaps this bug should go to IBM).

If I try to open the corrupt version, it gives me the message, and I can choose to fix it, whereupon it opens.  However, the excel log does not give me information about what was fixed.

There are slight differences between a couple of files in the resulting diffs.  
Here are the differences in the output of zipinfo:

Good xlsx:

gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(13\).xlsx 
Archive:  Results (13).xlsx   6704 bytes   11 files
-rw----     2.0 fat      598 bl defN 10-Jul-17 13:43 _rels/.rels
-rw----     2.0 fat     1197 bl defN 10-Jul-17 13:43 [Content_Types].xml
-rw----     2.0 fat      184 bl defN 10-Jul-17 13:43 docProps/app.xml
-rw----     2.0 fat      443 bl defN 10-Jul-17 13:43 docProps/core.xml
-rw----     2.0 fat      131 bl defN 10-Jul-17 13:43 xl/drawings/drawing1.xml
-rw----     2.0 fat      138 bl defN 10-Jul-17 13:43 xl/sharedStrings.xml
-rw----     2.0 fat     3601 bl defN 10-Jul-17 13:43 xl/styles.xml
-rw----     2.0 fat      350 bl defN 10-Jul-17 13:43 xl/workbook.xml
-rw----     2.0 fat      576 bl defN 10-Jul-17 13:43 xl/_rels/workbook.xml.rels
-rw----     2.0 fat    22610 bl defN 10-Jul-17 13:43 xl/worksheets/sheet1.xml
-rw----     2.0 fat      305 bl defN 10-Jul-17 13:43 xl/worksheets/_rels/sheet1.xml.rels
11 files, 30133 bytes uncompressed, 5230 bytes compressed:  82.6%

Bad xlsx:

gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(15\).xlsx 
Archive:  Results (15).xlsx   6745 bytes   11 files
-rw----     2.0 fat      596 bl defN 10-Jul-17 13:52 _rels/.rels
-rw----     2.0 fat     1195 bl defN 10-Jul-17 13:52 [Content_Types].xml
-rw----     2.0 fat      184 bl defN 10-Jul-17 13:52 docProps/app.xml
-rw----     2.0 fat      441 bl defN 10-Jul-17 13:52 docProps/core.xml
-rw----     2.0 fat      131 bl defN 10-Jul-17 13:52 xl/drawings/drawing1.xml
-rw----     2.0 fat      138 bl defN 10-Jul-17 13:52 xl/sharedStrings.xml
-rw----     2.0 fat     3601 bl defN 10-Jul-17 13:52 xl/styles.xml
-rw----     2.0 fat      350 bl defN 10-Jul-17 13:52 xl/workbook.xml
-rw----     2.0 fat      574 bl defN 10-Jul-17 13:52 xl/_rels/workbook.xml.rels
-rw----     2.0 fat    22610 bl defN 10-Jul-17 13:52 xl/worksheets/sheet1.xml
-rw----     2.0 fat      303 bl defN 10-Jul-17 13:52 xl/worksheets/_rels/sheet1.xml.rels

tail of od command on _rels/.rels:

good version:

0001100    037057  005015  027474  062522  060554  064564  067157  064163
0001120    070151  037163  005015                                        
0001126

bad version:

0001100    037057  005015  027474  062522  060554  064564  067157  064163
0001120    070151  037163                                                
0001124

I am attaching the following:

Our Excel struts action
The Excel Helper class that builds the excel

The bad(corrupt) xlsx
The good xlsx

Thanks!

Carl Buxbaum
Comment 1 Carl Buxbaum 2017-07-10 18:06:57 UTC
Created attachment 35111 [details]
working xlsx file, generated with Tomcat Oracle JVM 1.8
Comment 2 Carl Buxbaum 2017-07-10 18:09:06 UTC
Created attachment 35112 [details]
struts action for building excel
Comment 3 Carl Buxbaum 2017-07-10 18:09:47 UTC
Created attachment 35113 [details]
Helper class for building xlsx
Comment 4 Nick Burch 2017-07-10 18:16:25 UTC
Assuming you have an active IBM support contract for your troublesome websphere install, I'd suggest punting it over to them. They'll have everything setup to reproduce it, which I'm not sure any of us do here, and they'll be much more experienced at debugging websphere + ibm jvm issues!
Comment 5 Carl Buxbaum 2017-07-10 18:45:32 UTC
Hi Nick,

This issue originated from a customer of ours, so I would say that it's not
a particular WebSphere install that is the problem.  I would reckon that
any xlsx built by poi using the IBM JVM will exhibit the same problem, so
if there is something that poi can do to help create a compatible zip,
then it may be useful for someone in the poi project to take a look.


Thanks,

Carl
Comment 6 Javen O'Neal 2017-07-10 19:16:38 UTC
(In reply to Nick Burch from comment #4)
> I'm not sure any of us [have Websphere IBM Java set up to reproduce this], and
> they'll be much more experienced at debugging websphere + ibm jvm issues!

Here's are automated builds, for which we do not have an enabled job that runs on an IBM JDK, let alone on Webpphere.
https://builds.apache.org/view/P/view/POI/

It's difficult to fix something without being able to reproduce or test it.
Comment 7 Andreas Beeker 2017-07-10 19:46:47 UTC
Would you mind trying it with the trunk version?
... I've deactivated indenting after the beta came out - if it's a newline problem, this might be solved by that.

Furthermore I'm not sure if that comment is relevant or true, but regarding [1]:
"NOTE: If you are using a unix system, be aware of linebreaks, the OPC uses CRLF not LF."


[1] https://stackoverflow.com/questions/36063375
Comment 8 Carl Buxbaum 2017-07-10 20:11:42 UTC
Hi, I put Windows as hardware because the issue occurs when the document is generated and served by a WebSphere instance running under IBM JVM 1.8, not when served by my local, which happens to be a mac.
Comment 9 Stefan Bodewig 2017-07-11 06:16:18 UTC
Created attachment 35114 [details]
content diff

The only content differences are missing new lines at the end of a couple of XML files, this shouldn't throw off an XML parser but if the format is sensitive to line-ends, this may well become a problem.
Comment 10 Andreas Beeker 2017-07-11 11:26:42 UTC
(In reply to Stefan Bodewig from comment #9)
> The only content differences are missing new lines at the end of a couple of
> XML files, this shouldn't throw off an XML parser but if the format is
> sensitive to line-ends, this may well become a problem.

There's an info missing, which was in Carls email thread - so the xml is probably not the culprit:

> If I create a "corrupt" excel, unzip it, and then zip it back up (on my
> mac using zip command), the resulting zip file opens without issue.
> If a colleague on Windows generates the same excel, and does the same
> probably using windup, the corruption remains.
Comment 11 Mark Murphy 2017-07-11 12:06:48 UTC
I use the IBM JVM v1.6.26 on IBM i, no Websphere. It has no issues with POI 3.14. I can create valid XLSX documents.

(In reply to Carl Buxbaum from comment #5)
> Hi Nick,
> 
> This issue originated from a customer of ours, so I would say that it's not
> a particular WebSphere install that is the problem.  I would reckon that
> any xlsx built by poi using the IBM JVM will exhibit the same problem, so
> if there is something that poi can do to help create a compatible zip,
> then it may be useful for someone in the poi project to take a look.
> 
> 
> Thanks,
> 
> Carl
Comment 12 Stefan Bodewig 2017-07-11 14:27:53 UTC
I started looking through the archives in a hex-editor and the first local file headers looked fine, so I skipped to the end.

While Results_good.xlsx ends with the "end of central directory" record one would expect, the Results_bad.xlsx contains some garbage after it, namely

Error 500: java.lang.IllegalStateException\r\n

so it seems some log output has made it into the stream.

Usually ZIP archivers will read an archive from the back looking for the EOCD record (or its ZIP64 cousin) and ignore garbage at the end. It is not uncommon to find code for self-extracting archives there. This is why zip and friends won't complain about the archive, Excel seems to be more picky.
Comment 13 Stefan Bodewig 2017-07-11 14:35:20 UTC
Just to make sure it really is the garbage at the end you could perform

head --bytes=-44 < Results_bad.xlsx > Results.xlsx

and try to feed Results.xlsx to Excel. I've only got Libre Office installed which accepts the "corrupt" sheet without any warning, so I cannot check myself.
Comment 14 Stefan Bodewig 2017-07-11 14:42:18 UTC
Created attachment 35121 [details]
potentially "repaired" Results_bad.xlsx

I realized my command may have been GNU head specific, so uploaded the result directly.
Comment 15 Carl Buxbaum 2017-07-11 16:17:51 UTC
(In reply to Stefan Bodewig from comment #14)
> Created attachment 35121 [details]
> potentially "repaired" Results_bad.xlsx
> 
> I realized my command may have been GNU head specific, so uploaded the
> result directly.

Thank you so much!  I don't know why I did not look directly at the excel file in an editor.

I did edit out the IllegalArgumentException and it still appears corrupt, but I am mystified as to how that Exception would get in their n the first place.  I do not see anything being thrown in the logs.

Thanks,

Carl
Comment 16 Dominik Stadler 2017-09-28 12:59:27 UTC
The latest "repaired Results_bad.xlsx" opens for me in Excel without any corruption warning, so I don't see what we can do here from our end. 

If you still think there is a problem in POI please try to provide a more self-sufficient and minimal unit test which produces the corrupt file. The current code is intertwined with Apache Struts code and other things that are not related to the problem at all.
Comment 17 Carl Buxbaum 2017-09-28 15:45:33 UTC
(In reply to Dominik Stadler from comment #16)
> The latest "repaired Results_bad.xlsx" opens for me in Excel without any
> corruption warning, so I don't see what we can do here from our end. 
> 
> If you still think there is a problem in POI please try to provide a more
> self-sufficient and minimal unit test which produces the corrupt file. The
> current code is intertwined with Apache Struts code and other things that
> are not related to the problem at all.

Hi Dominik et. al.,

I did finally discover what was causing this.  Essentially doing a redirect instead of a forward fixes it in our struts application:

The jsp uses JSPWriter to create the jsp page, and the struts action uses the OutputStream from the response to stream the excel. It is not permissible for both to be used in the same request, and the error generated results in a message being appended to the OutputStream( and therefore to the excel spreadsheet, corrupting it). Therefore, the fix is to do a redirect to the struts action, which creates a new request that only handles the response OutputStream, instead of forwarding to the struts action, which handles the streaming of the excel in the same request as the jsp. I don't know why this only manifests in WebSphere, and Tomcat seems to not have this issue.