Summary: | SXSSFWorkbook, invalid xml characters, corrupted XLSX | ||
---|---|---|---|
Product: | POI | Reporter: | Catalin Z. Alexandru <catalinalexandru.zamfir> |
Component: | SXSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.8-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: | SXSSFWorkbook generated file |
Excel reports: "Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Illegal xml character. Line 394, column 267.". Looking in sheet1.xml, at line: 394, column 267, around it i see this: "If youâre looking for a palm-sweating". The "267" column is the ";" in "". Tried to decode the entire entity, but it outputs a weird character. SXSSFWorkbook should ignore unknown or invalid characters for XML. I've tracked this issue down and seems that the original source of this message, contains the same unprintable characters. Does not show up, but can easily be spotted in the source of the original document. As far as I know < ASCII 32, are control characters. Shouldn't these be ignored? Not encoded. As they're not printable they actually don't provide any useful value for anybody. XSSFWorkbook does a proper job ignoring this. SXSSFWorkbook doesn't. Should be fixed in r1294657 Your diagnosis is correct, writing a ISO control character ( < 32) resulted in a corrupted workbook. I could easily reproduce it with the following simple code: Workbook wb = new SXSSFWorkbook(); Sheet sh = wb.createSheet(); Cell cell = sh.createRow(0).createCell(0); cell.setCellValue("\u0000"); XSSF delegates writing XML to XmlBeans and this framework replaces characters below 32 with question marks. I changed SXSSF to do so too. It appears that there are two more special cases where you can't simply write a char code in XML: case 1: low and high unicode surrogates: DC00-DFFF and D800-D8FF case 2: 'not a character' range: FFFE-FFFF XmlBeans replaces characters from these ranges with question marks, so I fixed SXSSF to be consistent. Yegor Yegor, I am facing the same problem, where can I download the jar files of this release? Please advise regards, Sheikh Hi, I am able to download the version 3.8. http://www.apache.org/dyn/closer.cgi/poi/release/bin/poi-bin-3.8-20120326.zip Thanks. regards, Sheikh |
Created attachment 28395 [details] SXSSFWorkbook generated file Exporting with SXSSFWorkbook, generates a corrupted .xlsx file. I've attached the generated XLSX file. Viewed it with an XML viewer, but could not find the problem. Generating the same XLSX, from the same data, with XSSFWorkbook, generates a proper .xlsx file. We're using SXSSFWorkbook, for memory issues. We've now using XSSFWorkbook as a quick-fix/workaround, but wish to identifiy the problem here.