Bug 46576 - EXCEL, ability to enable/disable text extraction by Cell Type
Summary: EXCEL, ability to enable/disable text extraction by Cell Type
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.5-dev
Hardware: All Mac OS X 10.4
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-21 11:42 UTC by woody
Modified: 2009-01-22 10:33 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description woody 2009-01-21 11:42:01 UTC
It would be useful to be able to control extracted text by type of cell.
e.g. omit numeric/boolean etc.

The idea would be for ExcelExtractor class(es) to have methods such as:

public void setInclude( int cellType, boolean incl );
public boolean isIncluded( int cellType );

where cellType comes from: org.apache.poi.ss.usermodel.Cell.CELL_TYPE_*

numeric and boolean are my primary concern (as they are not important in my use case), but each cell type would be more generic.
A named method per type is also sufficient if that is more of the coding standard being employed.
Comment 1 Nick Burch 2009-01-22 01:40:49 UTC
If you want that degree of fine grained control, you'll be much better off just writing your own HSSF Usermodel code to loop over the workbook, pulling out and formatting things as you see fit.

See the documentation on the site for starters, and maybe also take a look at how the current extractor does it
Comment 2 woody 2009-01-22 10:33:09 UTC
due to the fact that there are multiple implementers of ExcelExtractor and duplicating them and the factory logic makes it difficult to incorporate subsequent bug fixes and/or enhancements in the extractor code lines, it's not a great move to dupe them simply to get this minor addition, at least from my perspective (focused as it is).

i wrote the logic into the extractors, and augmented the EventBasedExtractor to implement the ExcelExtractor interface as well. Cleaned up some stringbuffer code and aligned some constants that were (somewhat) dangerously defined. This is in the following patch.

i would appreciate your review:
https://issues.apache.org/bugzilla/show_bug.cgi?id=46581


(In reply to comment #1)
> If you want that degree of fine grained control, you'll be much better off just
> writing your own HSSF Usermodel code to loop over the workbook, pulling out and
> formatting things as you see fit.
> 
> See the documentation on the site for starters, and maybe also take a look at
> how the current extractor does it
>