Bug 46576

Summary: EXCEL, ability to enable/disable text extraction by Cell Type
Product: POI Reporter: woody <woody>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: enhancement    
Priority: P2    
Version: 3.5-dev   
Target Milestone: ---   
Hardware: All   
OS: Mac OS X 10.4   

Description woody 2009-01-21 11:42:01 UTC
It would be useful to be able to control extracted text by type of cell.
e.g. omit numeric/boolean etc.

The idea would be for ExcelExtractor class(es) to have methods such as:

public void setInclude( int cellType, boolean incl );
public boolean isIncluded( int cellType );

where cellType comes from: org.apache.poi.ss.usermodel.Cell.CELL_TYPE_*

numeric and boolean are my primary concern (as they are not important in my use case), but each cell type would be more generic.
A named method per type is also sufficient if that is more of the coding standard being employed.
Comment 1 Nick Burch 2009-01-22 01:40:49 UTC
If you want that degree of fine grained control, you'll be much better off just writing your own HSSF Usermodel code to loop over the workbook, pulling out and formatting things as you see fit.

See the documentation on the site for starters, and maybe also take a look at how the current extractor does it
Comment 2 woody 2009-01-22 10:33:09 UTC
due to the fact that there are multiple implementers of ExcelExtractor and duplicating them and the factory logic makes it difficult to incorporate subsequent bug fixes and/or enhancements in the extractor code lines, it's not a great move to dupe them simply to get this minor addition, at least from my perspective (focused as it is).

i wrote the logic into the extractors, and augmented the EventBasedExtractor to implement the ExcelExtractor interface as well. Cleaned up some stringbuffer code and aligned some constants that were (somewhat) dangerously defined. This is in the following patch.

i would appreciate your review:
https://issues.apache.org/bugzilla/show_bug.cgi?id=46581


(In reply to comment #1)
> If you want that degree of fine grained control, you'll be much better off just
> writing your own HSSF Usermodel code to loop over the workbook, pulling out and
> formatting things as you see fit.
> 
> See the documentation on the site for starters, and maybe also take a look at
> how the current extractor does it
>