Created attachment 32284 [details] Attached! source Hi, After 7 lakh records file are getting corroupted,when i am writing 1 million recors with 60 columns its working fine but when i am writing 7.5 lakh records with 100 columns file are getting corrupted. I have attached snipnet of code . Kindly suggest i if doing something wrong package com.rati.Dec; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming.SXSSFWorkbook; public class MillionTest1 { public static void main(String[] args) { // TODO Auto-generated method stub SXSSFWorkbook workbook = new SXSSFWorkbook(100); workbook.setCompressTempFiles(true); SXSSFSheet sheet = null; Row row = null; Cell cell = null; sheet = (SXSSFSheet) workbook.createSheet("Test"); FileOutputStream fileOut = null; try { System.out.println("Started!"); for (int i = 0; i < 777384; i++) { row = sheet.createRow(i); for (int j = 0; j < 100; j++) { cell = row.createCell(j); cell.setCellValue("Rati : row : "+i +" Col: "+ j); } } fileOut = new FileOutputStream( new File("Z:\\output\\Version\\Version11.xlsx")); workbook.write(fileOut); } catch (Exception exception) { exception.printStackTrace(); } finally { workbook.dispose(); try { fileOut.flush(); fileOut.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println("Completed!"); } } }
How is the file getting corrupt? How do the files differ between 3.9 and 3.10? Did you try with 3.11 beta 3? Did you try generating a simple file with the same number of rows+columns using SSPerformanceTest (see http://poi.apache.org/faq.html#faq-N10109)? Did that file work fine when generated at the same row+column size?
sir.i have tried all version and compatible dependency jars.still file is getting corrupted. u can also try running this or attached code! :(
(In reply to Nick Burch from comment #1) > How is the file getting corrupt? > > How do the files differ between 3.9 and 3.10? > > Did you try with 3.11 beta 3? > > Did you try generating a simple file with the same number of rows+columns > using SSPerformanceTest (see http://poi.apache.org/faq.html#faq-N10109)? Did > that file work fine when generated at the same row+column size? No Sir, i have tried using SSPerformanceTest args = new String[4]; args[0] = "SXSSF"; args[1] = "1000000";->Rows args[2] = "100";--->Cols args[3] = "1"; but still file is getting corrupt.
We are also using 3.9 and experiencing the same corruption. Ran the performance test with the same results as reporting user.
I think this the problem may happen around the 2G/4G size border, so it may be related to the trouble in some JDK versions with handling zip-files larger than 2G/4G, can you state which exact Java version you are using?
I tried to look up details of support of large zip files in Java, but the bug-database is not very definite about what is supported. However https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64 states that only Java 7 fully supports files larger than 4G, can you try running your test with the latest patchlevel of this version to verify this?
I tried with java version "1.6.0_26" and could not reproduce the problem there.
Created attachment 32355 [details] Error Shown from Excel Error and compiler version
Created attachment 32356 [details] Jars Used HI, Run the test using Jdk 1.8 and latest poi jars. here the code I used package com.rati.Dec; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming.SXSSFWorkbook; public class MillionTest1 { @SuppressWarnings("resource") public static void main(String[] args) { // TODO Auto-generated method stub SXSSFWorkbook workbook = new SXSSFWorkbook(100); workbook.setCompressTempFiles(true); SXSSFSheet sheet = null; Row row = null; Cell cell = null; sheet = (SXSSFSheet) workbook.createSheet("Test"); FileOutputStream fileOut = null; try { System.out.println("Started!"); for (int i = 0; i < 777384; i++) { row = sheet.createRow(i); for (int j = 0; j < 100; j++) { cell = row.createCell(j); cell.setCellValue("Rati : row : "+i +" Col: "+ j); } } fileOut = new FileOutputStream( new File("Z:\\Version11.xlsx")); workbook.write(fileOut); } catch (Exception exception) { exception.printStackTrace(); } finally { workbook.dispose(); try { fileOut.flush(); fileOut.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println("Completed!"); } } } Same error Is popping up. I have also attached the screen shots in the attachment section. Please check once.
It also been tested multiple system but issue remains the same.
I've just run your test program on this jvm: java version "1.6.0_33" OpenJDK Runtime Environment (IcedTea6 1.13.5) (6b33-1.13.5-1ubuntu0.12.04) The generated file (268mb!) is processed just fine with Apache Tika The generated file takes an absolute epic age to load in Excel then Excel gives a warning about unreadable content, after repairing it reports: "System error: - 2134163449. Line 437452, column 5899". However, it does still show 777384 rows, all with 100 columns The only thing I can suggest is finding the exact point when the file files to load (eg 777384 fails, 777383 works), then unzip both files and compare the XML to see how they differ. (We now have a tool to help with this - OOXMLPrettyPrint)
Using Rati's code with Java 7u76, the file works with a row count of 561564, but breaks at 561565 and higher.
Matt i think this is the range can you provide the file size and column count.
Hi Rati, Both files are reported as 197,511 KB (202,250,805 and 202,251,162 bytes), and have the 100 columns as in the sample code. 7-Zip reports the uncompressed size of the worksheet as 4,294,971,804 bytes and 4,294,979,492 bytes, which is just over the 4GB size for the broken file. I'm also uploading the files that were created, before being opened/repaired by Excel. The files will be available at https://www.dropbox.com/sh/v322ryfnw7zk6ra/AADlEnu9t-25OCKNquJvdDyQa?dl=0
I think bug 54523 reports pretty much the same thing.
I am using version 3.14
Here is additional details: Excel Version : Excel 2010 Apache POI Version: 3.14 & 3.15 Java Version : JDK 8 (1.8.0_101) OS Version : Windows 7 Enterprise 64-bit Excel sheet information: Number of columns : 208 Number of rows : 390,000 or more Average column width: 15 characters Max column width : 40 characters Problem starts when the number of rows exceed 410,000 or so... Error message displayed in Excel: "Excel found unreadable content in [file name]. Do you want to recover the contents of this workbook? If you trust the source of this workbook, click Yes" This ONLY HAPPENS when the number of rows exceeds 410,000 rows or so.
Dear All, Any workaround to this issue? We are also facing the same issue with similar approach. Appreciate your help on this. Thanks Ramesh
Patches are greatly appreciated. Appreciate your help on this.
I ran the test case in https://bz.apache.org/bugzilla/show_bug.cgi?id=54523 using latest poi code and Excel 2016 (Mac) gives the error about corrupt data. I was able to unzip the generated xlsx with p7zip. The default unzip on Macs can't handle zips containing such large files. The generated worksheet is 4Gb but the XML looks fine.
The comment at https://bz.apache.org/bugzilla/show_bug.cgi?id=54523#c6 indicates that a workaround is to unzip the file with 7zip and then zipping it again lead to a file that was usable in Excel.
*** Bug 54523 has been marked as a duplicate of this bug. ***
I did an in depth analysis of this issue. Turns out the problem is not with the OOXML data generated by POI. The problem has to do with the ZIP format. Specifically with ZIP64 extension. That's why it's all OK up until sheet1.xml reaches over 4GB (uncompressed). I have all the details written up in a blog post: https://rzymek.github.io/post/excel-zip64/ Short story: Excel will want to repair the file if uncompressed size of a zip entry exceeds 4GB and ZIP's Local File Header (LFH) does not specify zip spec version 4.5 A minimal fix would go like this: 1. Switch commons-compress ZipArchiveOutputStream to Zip64Mode.Always (Apache POI uses commons-compress not java.util.zip) 2. Modify commons-compress to put 0 in 32bit size fields if size is not known. That is in LFH creation in streaming mode. Currently in Zip64Mode.Always and streaming zip creation commons-compress stores FF FF FF FF in 32bit size field and 00 00 00 00 00 00 00 00 in 64bit in LFH. Excel expects 00 00 00 00 in the 32bit size field only in LFH.
*** Bug 61832 has been marked as a duplicate of this bug. ***
Krzysztof - thanks for the analysis, could you report the issue to commons-compress? I'm going to add a setZip64Mode method on SXSSFWorkbook. https://svn.apache.org/viewvc?view=revision&revision=1848179
Issue submitted to commons-compress at https://issues.apache.org/jira/browse/COMPRESS-474
Method to set 64-bit mode was added via r1848179, unfortunately it cannot be set always currently because LibreOffice cannot open such files, see bug 62872 for details.
I have implemented a drop in replacement for ZipOutputStream based of my previous analysis in https://rzymek.github.io/post/excel-zip64/. It's available at https://github.com/rzymek/opczip and from maven central. It foremost supports Excel compatible zip64 archives. Version 1.0.2 also creates LibreOffice compatible xlsx files. LibreCalc still will not open zip64 archives, but is OK with zip version 4.5 in Local File Header. Just requires max 3.0 in Central Directory. This is also OK with Excel, which requires 4.5 in LHF but ignores version in CEN. This combination allows streaming creation without prior knowledge whether 4GB limit will be exceed or not. opczip is currently used by https://github.com/dhatim/fastexcel
Thanks Krzysztof. With Apache projects, we try to limit the amount of non-Apache dependencies. Would it be feasible for you to submit the implementation of opczip to us and we can have a mode in SXSSF that allows users to choose their preferred zip implementation.
@PJ_Fanning Sure, but I'm not fully clear on what you mean by "submit to us". Do you mean submit to Apache POI source repo, Apache Incubator or via https://projects.apache.org/create.html? Could you point me to what the next steps would be. Here's a least invasive scenario: Allow to tap into SXSSFWorkbook.injectData() [1], to provide own implementation of ZipArchiveOutputStream. For example via introducing protected createZipOutputStream() method. I could provide an OpcOutputStream wrapper extending ZipArchiveOutputStream. Just like I did with ZipOutputStream [2]. Then to create SXSSFWorkbook with >4GB sheet support, you would: SXSSFWorkbook wb = new SXSSFWorkbook() { protected ZipArchiveOutputStream createZipOutputStream(OutputStream out) { return new OpcZipArchiveOutputStream(out); } } [1] https://github.com/apache/poi/blob/cdbda8e615610b960844e58b89f51cf304ce00c6/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFWorkbook.java#L388 [2] https://github.com/rzymek/opczip/blob/master/src/main/java/com/github/rzymek/opczip/OpcZipOutputStream.java
Krzysztof, I'm asking for permission for us to copy your code and include it in Apache POI - to avoid us having a jar dependency.
(In reply to PJ Fanning from comment #31) > Krzysztof, I'm asking for permission for us to copy your code and include it > in Apache POI - to avoid us having a jar dependency. Sure! I'd just like have my name in the file's header. You only need OpcOutputStream.java and Zip64Impl.java. They both don't have any dependencies. https://github.com/rzymek/opczip/tree/master/src/main/java/com/github/rzymek/opczip
Submitted https://github.com/apache/poi/pull/154
Hi, I have tried to generate large report with the new POI(4.1.0) and I am still facing the issue. Did anyone try and got it worked?
(In reply to Alaap C from comment #34) > I have tried to generate large report with the new POI(4.1.0) and I am still > facing the issue. Did anyone try and got it worked? 6 months ago 4.1.0 was new, but "new" now means 4.1.1. Please update your dependencies.
POI 4.1.1 includes https://github.com/apache/poi/pull/154 You will need to call `void setZip64Mode(Zip64Mode.Always)` on your SXSSFWorkbook to get this behaviour.
We were able to reproduce this with 4.x version, however using 5.0.1 this issue seems to be fixed. The error message by Excel is no longer being displayed upon opening. We did a comparison: The uncompressed size is ~10% bigger. (4.8 vs 5.3 GB) The compressed size is roughly the same. (376 vs 376.7 MB)