Summary: FormulaParser is hard-coded to assume a maximum of 65536 rows Unfortunately, this means that formulae attempting to reference rows at larger numbers (possible in Excel 2007 and greater) will not parse, failing with org.apache.poi.ss.formula.FormulaParseException: Cell reference expected after sheet name at index ... The fix is to change the constant on line 722, which currently looks like: if (i<1 || i>65536) { A review for other hard-coding of 65536 might be an idea at this point too. I can see it in several other parts of the codebase including AreaRange.
Any chance you could both: * Grep to find some more places with that in? (Check org.apache.poi.ss and .xssf, it's fine to have it in .hssf classes) * See if you can work up a patch which checks the kind of workbook it is, and does the range check based on that? (We've got the constants in SpreadsheetVersion) Bonus marks if you can get it done before the CoreFiling Friday pub trip time ;-)
Created attachment 31456 [details] Patch to fix FormulaParser using SpreadsheetVersion Problematic references: ./org/apache/poi/ss/formula/FormulaParser.java: if (i<1 || i>65536) { ./org/apache/poi/ss/util/AreaReference.java: // Represented internally as x$1 to x$65536 ./org/apache/poi/ss/util/AreaReference.java: return new AreaReference(start + "$1:" + end + "$65536"); The attached patch fixes FormulaParser, which turned out to be trivial as there's already a SpreadsheetVersion in scope. Fixing AreaReference is a little more involved as the information needs passing through to the right point. So I think I'll save that treat for next week - happy weekend.
Thanks for that David, applied in r1582892. If you could do a patch for AreaRange, that'd be great. Also, if you could knock up a quick junit test case which tries to parse a formula with a row >65536, and tries it with HSSF + XSSF, to check it's invalid for HSSF but valid for XSSF, that'd be great!
This can be a good reference for writing a complete unit test: http://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx
Created attachment 32737 [details] Patch to add tests and fix AreaReference The attached patch adds tests for FormulaParser showing the original bug is fixed. It also improves the situation with AreaReference and adds tests for that. Unfortunately there are rather more references to AreaReference than I have time to fix (it now needs a SpreadsheetVersion upon construction), but I've improved things considerably and deprecated the old constructor which doesn't supply enough information.
Fixed an NPE in my patch (caught by the existing tests). Added some more tests and committed in r1685101, r1685104 Fixed.