Bug 56328

Summary: [Patch] FormulaParser is hard-coded to assume a maximum of 65536 rows
Product: POI Reporter: David North <dtn-asfbugs>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Bug Depends on:    
Bug Blocks: 58402    
Attachments: Patch to fix FormulaParser using SpreadsheetVersion
Patch to add tests and fix AreaReference

Description David North 2014-03-28 16:28:27 UTC
Summary: FormulaParser is hard-coded to assume a maximum of 65536 rows

Unfortunately, this means that formulae attempting to reference rows at larger numbers (possible in Excel 2007 and greater) will not parse, failing with 

org.apache.poi.ss.formula.FormulaParseException: Cell reference expected after sheet name at index ...

The fix is to change the constant on line 722, which currently looks like:

if (i<1 || i>65536) {

A review for other hard-coding of 65536 might be an idea at this point too. I can see it in several other parts of the codebase including AreaRange.
Comment 1 Nick Burch 2014-03-28 17:00:39 UTC
Any chance you could both:
 * Grep to find some more places with that in? (Check org.apache.poi.ss and .xssf, it's fine to have it in .hssf classes)
 * See if you can work up a patch which checks the kind of workbook it is, and does the range check based on that? (We've got the constants in SpreadsheetVersion)

Bonus marks if you can get it done before the CoreFiling Friday pub trip time ;-)
Comment 2 David North 2014-03-28 17:46:22 UTC
Created attachment 31456 [details]
Patch to fix FormulaParser using SpreadsheetVersion

Problematic references:

./org/apache/poi/ss/formula/FormulaParser.java:			if (i<1 || i>65536) {

./org/apache/poi/ss/util/AreaReference.java:            // Represented internally as x$1 to x$65536
./org/apache/poi/ss/util/AreaReference.java:        return new AreaReference(start + "$1:" + end + "$65536");

The attached patch fixes FormulaParser, which turned out to be trivial as there's already a SpreadsheetVersion in scope.

Fixing AreaReference is a little more involved as the information needs passing through to the right point. So I think I'll save that treat for next week - happy weekend.
Comment 3 Nick Burch 2014-03-28 21:55:34 UTC
Thanks for that David, applied in r1582892.

If you could do a patch for AreaRange, that'd be great. Also, if you could knock up a quick junit test case which tries to parse a formula with a row >65536, and tries it with HSSF + XSSF, to check it's invalid for HSSF but valid for XSSF, that'd be great!
Comment 4 Yaniv Kunda 2014-03-29 19:17:54 UTC
This can be a good reference for writing a complete unit test:
http://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx
Comment 5 David North 2015-05-15 15:36:45 UTC
Created attachment 32737 [details]
Patch to add tests and fix AreaReference

The attached patch adds tests for FormulaParser showing the original bug is fixed.

It also improves the situation with AreaReference and adds tests for that. Unfortunately there are rather more references to AreaReference than I have time to fix (it now needs a SpreadsheetVersion upon construction), but I've improved things considerably and deprecated the old constructor which doesn't supply enough information.
Comment 6 David North 2015-06-12 15:29:35 UTC
Fixed an NPE in my patch (caught by the existing tests). Added some more tests and committed in r1685101, r1685104

Fixed.