Bug 46544

Summary: [PATCH] Add main() method to org.apache.poi.hssf.extractor.ExcelExtractor
Product: POI Reporter: Georger Araujo <georger_br>
Component: HSSFAssignee: POI Developers List <dev>
Status: CLOSED FIXED    
Severity: enhancement    
Priority: P2    
Version: 3.5-dev   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Patch to add main() method to java org.apache.poi.hssf.extractor.ExcelExtractor
Patch that fixes filename parsing in r737173

Description Georger Araujo 2009-01-15 18:24:16 UTC
Hello,
Please add a main() method to org.apache.poi.hssf.extractor.ExcelExtractor so that it is possible to extract text from a .XLS file from the command line - like in org.apache.poi.hwpf.extractor.WordExtractor.
Regards,

Georger
Comment 1 Georger Araujo 2009-01-22 16:19:52 UTC
Created attachment 23163 [details]
Patch to add main() method to java org.apache.poi.hssf.extractor.ExcelExtractor

Tested successfully against SVN 20090122 (revision 736505).
Comment 2 Josh Micich 2009-01-23 12:24:40 UTC
Applied in svn r737173

I made the new command line interface capable of using stdin (when no input file is provided) and also able to set the four flags (includeSheetNames etc). Note that you must use "-i" before the file name when supplying it on the command line (your original patch allowed a file name with no prior argument).

Please take a look at the latest version of ExcelExtractor and make sure it does what you need, since there is no junit to exercise the new code.
Comment 3 Georger Araujo 2009-01-23 12:53:47 UTC
Hi,
Unfortunately it did not work:

--begin--
[georger@phoenix excel]# java org.apache.poi.hssf.extractor.ExcelExtractor -i test.xls
Specified input file '-i' does not exist
Use:
    org.apache.poi.hssf.extractor.ExcelExtractor [<flag> <value> [<flag> <value> [...]]] [-i <filename.xls>]
       -i <filename.xls> specifies input file (default is to use stdin)
       Flags can be set on or off by using the values 'Y' or 'N'.
       Following are available flags and their default values:
       --show-sheet-names  Y
       --evaluate-formulas Y
       --show-comments     N
       --show-blanks       Y
--end--
Comment 4 Georger Araujo 2009-01-23 14:59:13 UTC
Created attachment 23170 [details]
Patch that fixes filename parsing in r737173

This patch fixes r737173 into actually working. I tested it with several correct and incorrect command lines (existing file, non-existent file, directory, parameters taking values other than 'Y' or 'N', missing parameters) and all looks well.
Comment 5 Josh Micich 2009-01-23 15:08:23 UTC
(In reply to comment #4)
> Created an attachment (id=23170) [details]
> Patch that fixes filename parsing in r737173

I think it will work with this single line fix (svn r737238 ).  Please try it out.

Comment 6 Georger Araujo 2009-01-23 15:30:16 UTC
Hi Josh,
I applied your patch and ran into the same issue I got when I first worked on my second patch. This is what happens when I run the command without a filename:

--begin--
[georger@phoenix excel]# java org.apache.poi.hssf.extractor.ExcelExtractor -i
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.poi.hssf.extractor.ExcelExtractor$CommandArgs.<init>(ExcelExtractor.java:92)
        at org.apache.poi.hssf.extractor.ExcelExtractor.main(ExcelExtractor.java:186)
--end--

After I got this error in my own code, I made it more careful.
Comment 7 Josh Micich 2009-01-23 15:36:29 UTC
Fixed in svn 737248.

A unit test would be nice.
Comment 8 Georger Araujo 2009-01-23 15:40:44 UTC
Hi Josh,
That did the trick. Thanks!
I'll look into writing a unit test.