Bug 46544 - [PATCH] Add main() method to org.apache.poi.hssf.extractor.ExcelExtractor
Summary: [PATCH] Add main() method to org.apache.poi.hssf.extractor.ExcelExtractor
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-dev
Hardware: PC Windows XP
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-01-15 18:24 UTC by Georger Araujo
Modified: 2009-01-23 15:40 UTC (History)
0 users

Patch to add main() method to java org.apache.poi.hssf.extractor.ExcelExtractor (1.32 KB, patch)
2009-01-22 16:19 UTC, Georger Araujo
Details | Diff
Patch that fixes filename parsing in r737173 (874 bytes, patch)
2009-01-23 14:59 UTC, Georger Araujo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Georger Araujo 2009-01-15 18:24:16 UTC
Please add a main() method to org.apache.poi.hssf.extractor.ExcelExtractor so that it is possible to extract text from a .XLS file from the command line - like in org.apache.poi.hwpf.extractor.WordExtractor.

Comment 1 Georger Araujo 2009-01-22 16:19:52 UTC
Created attachment 23163 [details]
Patch to add main() method to java org.apache.poi.hssf.extractor.ExcelExtractor

Tested successfully against SVN 20090122 (revision 736505).
Comment 2 Josh Micich 2009-01-23 12:24:40 UTC
Applied in svn r737173

I made the new command line interface capable of using stdin (when no input file is provided) and also able to set the four flags (includeSheetNames etc). Note that you must use "-i" before the file name when supplying it on the command line (your original patch allowed a file name with no prior argument).

Please take a look at the latest version of ExcelExtractor and make sure it does what you need, since there is no junit to exercise the new code.
Comment 3 Georger Araujo 2009-01-23 12:53:47 UTC
Unfortunately it did not work:

[georger@phoenix excel]# java org.apache.poi.hssf.extractor.ExcelExtractor -i test.xls
Specified input file '-i' does not exist
    org.apache.poi.hssf.extractor.ExcelExtractor [<flag> <value> [<flag> <value> [...]]] [-i <filename.xls>]
       -i <filename.xls> specifies input file (default is to use stdin)
       Flags can be set on or off by using the values 'Y' or 'N'.
       Following are available flags and their default values:
       --show-sheet-names  Y
       --evaluate-formulas Y
       --show-comments     N
       --show-blanks       Y
Comment 4 Georger Araujo 2009-01-23 14:59:13 UTC
Created attachment 23170 [details]
Patch that fixes filename parsing in r737173

This patch fixes r737173 into actually working. I tested it with several correct and incorrect command lines (existing file, non-existent file, directory, parameters taking values other than 'Y' or 'N', missing parameters) and all looks well.
Comment 5 Josh Micich 2009-01-23 15:08:23 UTC
(In reply to comment #4)
> Created an attachment (id=23170) [details]
> Patch that fixes filename parsing in r737173

I think it will work with this single line fix (svn r737238 ).  Please try it out.

Comment 6 Georger Araujo 2009-01-23 15:30:16 UTC
Hi Josh,
I applied your patch and ran into the same issue I got when I first worked on my second patch. This is what happens when I run the command without a filename:

[georger@phoenix excel]# java org.apache.poi.hssf.extractor.ExcelExtractor -i
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.poi.hssf.extractor.ExcelExtractor$CommandArgs.<init>(ExcelExtractor.java:92)
        at org.apache.poi.hssf.extractor.ExcelExtractor.main(ExcelExtractor.java:186)

After I got this error in my own code, I made it more careful.
Comment 7 Josh Micich 2009-01-23 15:36:29 UTC
Fixed in svn 737248.

A unit test would be nice.
Comment 8 Georger Araujo 2009-01-23 15:40:44 UTC
Hi Josh,
That did the trick. Thanks!
I'll look into writing a unit test.