Bug 47339

Summary: Unexpected base token id (24)
Product: POI Reporter: Robert Kish <robert_kish>
Component: HSSFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.2-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Cell 1, Row 1 contains error that causes the problem
Formula referencing range's label / name that also fails.

Description Robert Kish 2009-06-09 13:14:11 UTC
I'm working on a project to extract the content of XLS into useable text files. I'm trying to get an understanding of how data types can be used. I created an XLS document with a formula "=a+B". This is invalid. I wanted to see how POI would handle it. It didn't handle it very well. In Excel CSV output, it would produce "#VALUE!". POI generates an exception which is listed below.

org.apache.poi.hssf.record.formula.Ptg, method createBasePtg, does not handle value 24 (0x18). While reviewing the source, I don't see it handle value 0x1b either.

However, the code in createPtg method says "if (id < 0x20) {", then call createBasePtg.

The source in 3.5 is the same as 3.2.

Unfortunately, I don't have a patch or a fix for this issue, as I don't fully understand the correct action that should be taken.

Exception in thread "main" org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance
	at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:186)
	at org.apache.poi.hssf.eventusermodel.HSSFRecordStream.getNextRecord(HSSFRecordStream.java:162)
	at org.apache.poi.hssf.eventusermodel.HSSFRecordStream.nextRecord(HSSFRecordStream.java:93)
	at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:141)
	at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:98)
	at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:63)
Caused by: java.lang.RuntimeException: Unexpected base token id (24)
	at org.apache.poi.hssf.record.formula.Ptg.createBasePtg(Ptg.java:161)
	at org.apache.poi.hssf.record.formula.Ptg.createPtg(Ptg.java:82)
	at org.apache.poi.hssf.record.formula.Ptg.readTokens(Ptg.java:54)
	at org.apache.poi.hssf.record.FormulaRecord.<init>(FormulaRecord.java:202)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:184)
	... 7 more
Comment 1 Robert Kish 2009-06-09 13:14:44 UTC
Created attachment 23782 [details]
Cell 1, Row 1 contains error that causes the problem
Comment 2 Robert Kish 2009-06-10 11:36:07 UTC
Created attachment 23788 [details]
Formula referencing range's label / name that also fails.

The formula on line 4 =SUM(Results) also causes the reported error. If I change the formula to =SUM(A2:A3), then all is good. But using the Label's name, it doesn't work.
Comment 3 Josh Micich 2009-06-10 16:56:34 UTC
It looks like you are using a deprecated feature of Excel ('Natural Language References').  From version 2007 onward this style of formula reference is not available (Excel now automatically converts old files to use simple cell references).

Referencing cells/ranges with labels is fragile and vulnerable to ambiguity.  This is probably why MS has decided to steer away from it.  For example - What happens when the label cell is moved?  Is the actual cell below or to the right of the label cell? If the label refers to a range, how are the exact boundaries of the range determined?

A better solution is to use proper defined names (AKA 'named ranges') which have been around for a while.

I've closed this bug off for the moment (hoping that using defined names works for you).

'Natural Language References' are probably not in high demand for POI functionality, so if you want to have this supported, you'll probably need to do a lot of the leg-work.  It might make sense to just have POI convert old tNlr (0x18) to equivalent tRef(0x24) or tArea(0x25) PTGs (like Excel 2007 does).  Unfortunately, tNlr is not well documented.  I found a bit of information in the OOO source code.   It seems like the tNlr field layout is variable and governed by the second byte (first byte is 0x18).  In the OOO source code the values of the second byte have constants called "EXC_TOK_NLR_*" (see below).  With some quick experimentation (on an old machine with  Excel 2003) I was able to produce tNlr tokens with second byte (0x06, 0x07 and 0x0A).  Perhaps these are all that POI would need to support. 

Here is a link to some documentation about xlformula.hxx:

Some Hex Dumps of tNlr tokens
18 07 04 00 00 80   // from the sample file - label 'a' is in A5
18 06 06 00 09 80   // label in I7, value in J7
18 0A 0D 00 03 80   25 0E 00 11 00 03 00 03 00  // range label in D14, values in D15:D18  - note tArea(D15:D18) seems to be correctly encoded here (starting at 0x25).