Bug 56328 - [Patch] FormulaParser is hard-coded to assume a maximum of 65536 rows
Summary: [Patch] FormulaParser is hard-coded to assume a maximum of 65536 rows
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 58402
  Show dependency tree
 
Reported: 2014-03-28 16:28 UTC by David North
Modified: 2015-09-13 10:31 UTC (History)
0 users



Attachments
Patch to fix FormulaParser using SpreadsheetVersion (462 bytes, patch)
2014-03-28 17:46 UTC, David North
Details | Diff
Patch to add tests and fix AreaReference (9.33 KB, patch)
2015-05-15 15:36 UTC, David North
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David North 2014-03-28 16:28:27 UTC
Summary: FormulaParser is hard-coded to assume a maximum of 65536 rows

Unfortunately, this means that formulae attempting to reference rows at larger numbers (possible in Excel 2007 and greater) will not parse, failing with 

org.apache.poi.ss.formula.FormulaParseException: Cell reference expected after sheet name at index ...

The fix is to change the constant on line 722, which currently looks like:

if (i<1 || i>65536) {

A review for other hard-coding of 65536 might be an idea at this point too. I can see it in several other parts of the codebase including AreaRange.
Comment 1 Nick Burch 2014-03-28 17:00:39 UTC
Any chance you could both:
 * Grep to find some more places with that in? (Check org.apache.poi.ss and .xssf, it's fine to have it in .hssf classes)
 * See if you can work up a patch which checks the kind of workbook it is, and does the range check based on that? (We've got the constants in SpreadsheetVersion)

Bonus marks if you can get it done before the CoreFiling Friday pub trip time ;-)
Comment 2 David North 2014-03-28 17:46:22 UTC
Created attachment 31456 [details]
Patch to fix FormulaParser using SpreadsheetVersion

Problematic references:

./org/apache/poi/ss/formula/FormulaParser.java:			if (i<1 || i>65536) {

./org/apache/poi/ss/util/AreaReference.java:            // Represented internally as x$1 to x$65536
./org/apache/poi/ss/util/AreaReference.java:        return new AreaReference(start + "$1:" + end + "$65536");

The attached patch fixes FormulaParser, which turned out to be trivial as there's already a SpreadsheetVersion in scope.

Fixing AreaReference is a little more involved as the information needs passing through to the right point. So I think I'll save that treat for next week - happy weekend.
Comment 3 Nick Burch 2014-03-28 21:55:34 UTC
Thanks for that David, applied in r1582892.

If you could do a patch for AreaRange, that'd be great. Also, if you could knock up a quick junit test case which tries to parse a formula with a row >65536, and tries it with HSSF + XSSF, to check it's invalid for HSSF but valid for XSSF, that'd be great!
Comment 4 Yaniv Kunda 2014-03-29 19:17:54 UTC
This can be a good reference for writing a complete unit test:
http://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx
Comment 5 David North 2015-05-15 15:36:45 UTC
Created attachment 32737 [details]
Patch to add tests and fix AreaReference

The attached patch adds tests for FormulaParser showing the original bug is fixed.

It also improves the situation with AreaReference and adds tests for that. Unfortunately there are rather more references to AreaReference than I have time to fix (it now needs a SpreadsheetVersion upon construction), but I've improved things considerably and deprecated the old constructor which doesn't supply enough information.
Comment 6 David North 2015-06-12 15:29:35 UTC
Fixed an NPE in my patch (caught by the existing tests). Added some more tests and committed in r1685101, r1685104

Fixed.