Bug 60352 - XSSFExcelExtractor extracts "null" as text from empty cells
Summary: XSSFExcelExtractor extracts "null" as text from empty cells
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.16-dev
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-08 09:29 UTC by Cosmin Marginean
Modified: 2017-05-09 13:42 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cosmin Marginean 2016-11-08 09:29:27 UTC
We use XSSFExcelExtractor as a mechanism to extract the complete text in an Excel file, however it seems that in certain circumstances the "null" value is extracted from an empty cell.

For example:

> Breakdown of data generated by project, technology, submitting centre	null	> null	null	null	null	null
> null	null	null	null	null	null	null
> null	Abbreviation Definitions	null	null	null	null	null
> null	Platform	Definition	null	null	null	null
> null	LS454	454 Roche Genome Sequencer FLX System	null	null	null	


The patch is relatively simple (and I'm happy to create a PR for it on GitHub). All we need to is to wrap the last two lines in XSSFExcelExtractor.handleNonStringCell() with a null check

>         if (contents != null) {
>             checkMaxTextSize(text, contents);
>             text.append(contents);
>         }

This would then perform as expected and extract this text instead.

> Breakdown of data generated by project, technology, submitting centre						
> 						
> 	Abbreviation Definitions					
> 	Platform	Definition				
> 	LS454	454 Roche Genome Sequencer FLX System

We believe that an empty string is the preferred option here, because the text "null" itself might be used as cell contents in certain cases. In that situation it's difficult to discriminate between these occurrences (is it the text "null" or is the cell empty?)

Looking forward to hearing your thoughts.
Comment 1 Dominik Stadler 2017-05-08 18:11:48 UTC
Fixed via r1794260, should be included in release 3.17-beta1, thanks for the report and the suggested fix.
Comment 2 Cosmin Marginean 2017-05-09 13:42:26 UTC
Great! Many thanks!