Summary: | Incorrect "0" value for largish integers in xlsb files | ||
---|---|---|---|
Product: | POI | Reporter: | Tim Allison <tallison> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: | triggering document |
To solve the problem on convert xlsb to csv or text ("0" error fix): Use: poi-ooxml / version: 4.0.1 xmlbeans / version 3.0.1 POM: <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>4.0.1</version> </dependency> <dependency> <groupId>org.apache.xmlbeans</groupId> <artifactId>xmlbeans</artifactId> <version>3.0.1</version> </dependency> Example: import org.apache.poi.xssf.extractor.XSSFBEventBasedExcelExtractor; OPCPackage pkg = OPCPackage.open("\\file.xlsb", PackageAccess.READ); POIXMLTextExtractor ext = new XSSFBEventBasedExcelExtractor(pkg); System.out.println(ext.getText()); ------------- PAZ |
Created attachment 36194 [details] triggering document On the user list, Dejan Ikodinovic noted that some large integer values are incorrectly extracted as "0" in xlsb. I can reproduce this with the attached file, which, in Tika, yields: <table><tbody><tr> <td>1880000</td> <td>10000000</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>1880004</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>1880008</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>0</td></tr> <tr> <td>1880012</td></tr> I haven't figured out what the cause of this is. It is possible that the problem is at the Tika level, but my guess is that I botched something at the POI level. As a side note, if I save the file as xlsx, the numbers are extracted correctly.