Bug 60405 - AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros
Summary: AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.15-FINAL
Hardware: PC Mac OS X 10.1
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-22 21:36 UTC by Martin Oberhuber
Modified: 2019-01-27 10:08 UTC (History)
0 users



Attachments
MyWB.xls containing Excel 4.0 macro functions (21.50 KB, application/vnd.ms-excel)
2016-11-22 21:36 UTC, Martin Oberhuber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber 2016-11-22 21:36:40 UTC
Created attachment 34467 [details]
MyWB.xls containing Excel 4.0 macro functions

I'm using poi-3.15 from Sourceforge Docfetcher, to index Excel documents (among others) for searching. On attached document, the following exception is thrown, making it impossible to access any data of the document; this is a severe issue that I don't have any workaround for.


java.lang.ArrayIndexOutOfBoundsException: -32725
	at org.apache.poi.ss.formula.function.FunctionMetadataRegistry.getFunctionByIndexInternal(FunctionMetadataRegistry.java:66)
	at org.apache.poi.ss.formula.function.FunctionMetadataRegistry.getFunctionByIndex(FunctionMetadataRegistry.java:62)
	at org.apache.poi.ss.formula.ptg.FuncVarPtg.create(FuncVarPtg.java:56)
	at org.apache.poi.ss.formula.ptg.FuncVarPtg.create(FuncVarPtg.java:45)
	at org.apache.poi.ss.formula.ptg.Ptg.createClassifiedPtg(Ptg.java:103)
	at org.apache.poi.ss.formula.ptg.Ptg.createPtg(Ptg.java:84)
	at org.apache.poi.ss.formula.ptg.Ptg.readTokens(Ptg.java:55)
	at org.apache.poi.ss.formula.Formula.getTokens(Formula.java:82)
	at org.apache.poi.hssf.record.FormulaRecord.getParsedExpression(FormulaRecord.java:314)
	at org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate.getFormulaTokens(FormulaRecordAggregate.java:201)
	at org.apache.poi.hssf.usermodel.HSSFCell.getCellFormula(HSSFCell.java:649)
	at org.apache.poi.hssf.extractor.ExcelExtractor.getText(ExcelExtractor.java:339)
	at net.sourceforge.docfetcher.model.parse.MSExcelParser.renderText(MSExcelParser.java:57)


The issue is severe, since no other contents from the Excel file is available, due to the exception. Docfetcher runs POI like this:

    POIFSFileSystem fs = new POIFSFileSystem(in);
    extractor = new ExcelExtractor(fs);
    extractor.setFormulasNotResults(true);
    return extractor.getText();

Running my testcase in the debugger, the exception seems to occur on the sheet "Macro1" row 1 column 0 which contains this:
    =ALIGNMENT(2;FALSE;1;0;FALSE)
which appears to relate to this FormulaRecord._byteEncoding:
    [30, 2, 0, 29, 0, 30, 1, 0, 30, 0, 0, 29, 0, 66, 5, 43, -128]

Note that the document must be opened with "Macros disabled" in order to make this contents visible in Excel. The particular sheet contains several other "Excel 4.0 macro functions".

Expected behavior would be, that POI can ignore unknown functions/formulas/macros, such that at least the rest of the document can be indexed for search.
Comment 1 Dominik Stadler 2018-12-31 00:27:49 UTC
I made some initial steps to fix this, but am not sure if we parse the macro-functions correctly. 

Can you post the actual macros that are stored in the XLS file?
Comment 2 Dominik Stadler 2018-12-31 00:46:39 UTC
This seems to go quite a bit deeper than a simple parse error, the spec contains a separate list of functions called "cetab", which Apache POI does not support at all. The AIOOB is caused because parsing does not parse the bit "fCeFunc", which then ends up in the function-index and makes it out of bounds:


-----------
tab (15 bits): A structure that specifies the function to be called. If fCeFunc is 1, then this field
specifies a Cetab value. If fCeFunc is 0, then this field specifies a Ftab value.

C - fCeFunc (1 bit): A bit that specifies whether tab specifies a Cetab value or a Ftab value.
-----------


So it will not only require to fix parsing FuncVarPtg by reading the fCeFunc bit, but also implementing a second list of known function definitions, potentially ending up in new required functions later.


BTW, I did not find any such exception in our large regression testing ( http://people.apache.org/~centic/poi_regression/reportsAll/ ), which indicates that such files are likely very rare.
Comment 3 Dominik Stadler 2019-01-27 10:08:15 UTC
SVN r1852277 adds initial support for the cetab list of functions from the spec. Parsing formulas of this document works now