Summary: | Excel could not open <file> because some content is unreadable | ||
---|---|---|---|
Product: | POI | Reporter: | Carl Buxbaum <cbuxbaum> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | bodewig |
Priority: | P2 | ||
Version: | 3.15-FINAL | ||
Target Milestone: | --- | ||
Hardware: | Macintosh | ||
OS: | All | ||
Attachments: |
corrupt xlsx file generated with Websphere IBM JVM 1.8
working xlsx file, generated with Tomcat Oracle JVM 1.8 struts action for building excel Helper class for building xlsx content diff potentially "repaired" Results_bad.xlsx |
Created attachment 35111 [details]
working xlsx file, generated with Tomcat Oracle JVM 1.8
Created attachment 35112 [details]
struts action for building excel
Created attachment 35113 [details]
Helper class for building xlsx
Assuming you have an active IBM support contract for your troublesome websphere install, I'd suggest punting it over to them. They'll have everything setup to reproduce it, which I'm not sure any of us do here, and they'll be much more experienced at debugging websphere + ibm jvm issues! Hi Nick, This issue originated from a customer of ours, so I would say that it's not a particular WebSphere install that is the problem. I would reckon that any xlsx built by poi using the IBM JVM will exhibit the same problem, so if there is something that poi can do to help create a compatible zip, then it may be useful for someone in the poi project to take a look. Thanks, Carl (In reply to Nick Burch from comment #4) > I'm not sure any of us [have Websphere IBM Java set up to reproduce this], and > they'll be much more experienced at debugging websphere + ibm jvm issues! Here's are automated builds, for which we do not have an enabled job that runs on an IBM JDK, let alone on Webpphere. https://builds.apache.org/view/P/view/POI/ It's difficult to fix something without being able to reproduce or test it. Would you mind trying it with the trunk version? ... I've deactivated indenting after the beta came out - if it's a newline problem, this might be solved by that. Furthermore I'm not sure if that comment is relevant or true, but regarding [1]: "NOTE: If you are using a unix system, be aware of linebreaks, the OPC uses CRLF not LF." [1] https://stackoverflow.com/questions/36063375 Hi, I put Windows as hardware because the issue occurs when the document is generated and served by a WebSphere instance running under IBM JVM 1.8, not when served by my local, which happens to be a mac. Created attachment 35114 [details]
content diff
The only content differences are missing new lines at the end of a couple of XML files, this shouldn't throw off an XML parser but if the format is sensitive to line-ends, this may well become a problem.
(In reply to Stefan Bodewig from comment #9) > The only content differences are missing new lines at the end of a couple of > XML files, this shouldn't throw off an XML parser but if the format is > sensitive to line-ends, this may well become a problem. There's an info missing, which was in Carls email thread - so the xml is probably not the culprit: > If I create a "corrupt" excel, unzip it, and then zip it back up (on my > mac using zip command), the resulting zip file opens without issue. > If a colleague on Windows generates the same excel, and does the same > probably using windup, the corruption remains. I use the IBM JVM v1.6.26 on IBM i, no Websphere. It has no issues with POI 3.14. I can create valid XLSX documents. (In reply to Carl Buxbaum from comment #5) > Hi Nick, > > This issue originated from a customer of ours, so I would say that it's not > a particular WebSphere install that is the problem. I would reckon that > any xlsx built by poi using the IBM JVM will exhibit the same problem, so > if there is something that poi can do to help create a compatible zip, > then it may be useful for someone in the poi project to take a look. > > > Thanks, > > Carl I started looking through the archives in a hex-editor and the first local file headers looked fine, so I skipped to the end. While Results_good.xlsx ends with the "end of central directory" record one would expect, the Results_bad.xlsx contains some garbage after it, namely Error 500: java.lang.IllegalStateException\r\n so it seems some log output has made it into the stream. Usually ZIP archivers will read an archive from the back looking for the EOCD record (or its ZIP64 cousin) and ignore garbage at the end. It is not uncommon to find code for self-extracting archives there. This is why zip and friends won't complain about the archive, Excel seems to be more picky. Just to make sure it really is the garbage at the end you could perform head --bytes=-44 < Results_bad.xlsx > Results.xlsx and try to feed Results.xlsx to Excel. I've only got Libre Office installed which accepts the "corrupt" sheet without any warning, so I cannot check myself. Created attachment 35121 [details]
potentially "repaired" Results_bad.xlsx
I realized my command may have been GNU head specific, so uploaded the result directly.
(In reply to Stefan Bodewig from comment #14) > Created attachment 35121 [details] > potentially "repaired" Results_bad.xlsx > > I realized my command may have been GNU head specific, so uploaded the > result directly. Thank you so much! I don't know why I did not look directly at the excel file in an editor. I did edit out the IllegalArgumentException and it still appears corrupt, but I am mystified as to how that Exception would get in their n the first place. I do not see anything being thrown in the logs. Thanks, Carl The latest "repaired Results_bad.xlsx" opens for me in Excel without any corruption warning, so I don't see what we can do here from our end. If you still think there is a problem in POI please try to provide a more self-sufficient and minimal unit test which produces the corrupt file. The current code is intertwined with Apache Struts code and other things that are not related to the problem at all. (In reply to Dominik Stadler from comment #16) > The latest "repaired Results_bad.xlsx" opens for me in Excel without any > corruption warning, so I don't see what we can do here from our end. > > If you still think there is a problem in POI please try to provide a more > self-sufficient and minimal unit test which produces the corrupt file. The > current code is intertwined with Apache Struts code and other things that > are not related to the problem at all. Hi Dominik et. al., I did finally discover what was causing this. Essentially doing a redirect instead of a forward fixes it in our struts application: The jsp uses JSPWriter to create the jsp page, and the struts action uses the OutputStream from the response to stream the excel. It is not permissible for both to be used in the same request, and the error generated results in a message being appended to the OutputStream( and therefore to the excel spreadsheet, corrupting it). Therefore, the fix is to do a redirect to the struts action, which creates a new request that only handles the response OutputStream, instead of forwarding to the struts action, which handles the streaming of the excel in the same request as the jsp. I don't know why this only manifests in WebSphere, and Tomcat seems to not have this issue. |
Created attachment 35110 [details] corrupt xlsx file generated with Websphere IBM JVM 1.8 Hi, if I generate an xlsx file using a war deployed on Tomcat, I do not get an error. When I generate the same xlsx file using a war deployed on WebSphere I get the above error. I am attributing the difference to a difference in implementation of the zip functionality between Oracle JVM and IBM JVM (so perhaps this bug should go to IBM). If I try to open the corrupt version, it gives me the message, and I can choose to fix it, whereupon it opens. However, the excel log does not give me information about what was fixed. There are slight differences between a couple of files in the resulting diffs. Here are the differences in the output of zipinfo: Good xlsx: gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(13\).xlsx Archive: Results (13).xlsx 6704 bytes 11 files -rw---- 2.0 fat 598 bl defN 10-Jul-17 13:43 _rels/.rels -rw---- 2.0 fat 1197 bl defN 10-Jul-17 13:43 [Content_Types].xml -rw---- 2.0 fat 184 bl defN 10-Jul-17 13:43 docProps/app.xml -rw---- 2.0 fat 443 bl defN 10-Jul-17 13:43 docProps/core.xml -rw---- 2.0 fat 131 bl defN 10-Jul-17 13:43 xl/drawings/drawing1.xml -rw---- 2.0 fat 138 bl defN 10-Jul-17 13:43 xl/sharedStrings.xml -rw---- 2.0 fat 3601 bl defN 10-Jul-17 13:43 xl/styles.xml -rw---- 2.0 fat 350 bl defN 10-Jul-17 13:43 xl/workbook.xml -rw---- 2.0 fat 576 bl defN 10-Jul-17 13:43 xl/_rels/workbook.xml.rels -rw---- 2.0 fat 22610 bl defN 10-Jul-17 13:43 xl/worksheets/sheet1.xml -rw---- 2.0 fat 305 bl defN 10-Jul-17 13:43 xl/worksheets/_rels/sheet1.xml.rels 11 files, 30133 bytes uncompressed, 5230 bytes compressed: 82.6% Bad xlsx: gloucester-pc:Downloads cbuxbaum$ zipinfo Results\ \(15\).xlsx Archive: Results (15).xlsx 6745 bytes 11 files -rw---- 2.0 fat 596 bl defN 10-Jul-17 13:52 _rels/.rels -rw---- 2.0 fat 1195 bl defN 10-Jul-17 13:52 [Content_Types].xml -rw---- 2.0 fat 184 bl defN 10-Jul-17 13:52 docProps/app.xml -rw---- 2.0 fat 441 bl defN 10-Jul-17 13:52 docProps/core.xml -rw---- 2.0 fat 131 bl defN 10-Jul-17 13:52 xl/drawings/drawing1.xml -rw---- 2.0 fat 138 bl defN 10-Jul-17 13:52 xl/sharedStrings.xml -rw---- 2.0 fat 3601 bl defN 10-Jul-17 13:52 xl/styles.xml -rw---- 2.0 fat 350 bl defN 10-Jul-17 13:52 xl/workbook.xml -rw---- 2.0 fat 574 bl defN 10-Jul-17 13:52 xl/_rels/workbook.xml.rels -rw---- 2.0 fat 22610 bl defN 10-Jul-17 13:52 xl/worksheets/sheet1.xml -rw---- 2.0 fat 303 bl defN 10-Jul-17 13:52 xl/worksheets/_rels/sheet1.xml.rels tail of od command on _rels/.rels: good version: 0001100 037057 005015 027474 062522 060554 064564 067157 064163 0001120 070151 037163 005015 0001126 bad version: 0001100 037057 005015 027474 062522 060554 064564 067157 064163 0001120 070151 037163 0001124 I am attaching the following: Our Excel struts action The Excel Helper class that builds the excel The bad(corrupt) xlsx The good xlsx Thanks! Carl Buxbaum