Bug 62815

Summary: Incorrect "0" value for largish integers in xlsb files
Product: POI Reporter: Tim Allison <tallison>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: triggering document

Description Tim Allison 2018-10-10 19:57:53 UTC
Created attachment 36194 [details]
triggering document

On the user list, Dejan Ikodinovic noted that some large integer values are incorrectly extracted as "0" in xlsb.

I can reproduce this with the attached file, which, in Tika, yields:

<table><tbody><tr>      <td>1880000</td>        <td>10000000</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>1880004</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>1880008</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>0</td></tr>
<tr>    <td>1880012</td></tr>

I haven't figured out what the cause of this is.  It is possible that the problem is at the Tika level, but my guess is that I botched something at the POI level.

As a side note, if I save the file as xlsx, the numbers are extracted correctly.
Comment 1 Tim Allison 2018-10-11 14:15:09 UTC
r1843553
Comment 2 marcelo 2019-05-24 19:20:34 UTC
To solve the problem on convert xlsb to csv or text ("0" error fix):

Use: 

poi-ooxml / version: 4.0.1
xmlbeans / version 3.0.1

POM:
<dependency>
<groupId>org.apache.poi</groupId>
 <artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>

<dependency>
<groupId>org.apache.xmlbeans</groupId>
 <artifactId>xmlbeans</artifactId>
<version>3.0.1</version>
</dependency>

Example:

import org.apache.poi.xssf.extractor.XSSFBEventBasedExcelExtractor;

OPCPackage pkg = OPCPackage.open("\\file.xlsb", PackageAccess.READ);
POIXMLTextExtractor ext = new XSSFBEventBasedExcelExtractor(pkg);

System.out.println(ext.getText());

-------------
PAZ