Bug 7655 - Problem when reading Some xls file (RICH TEXT HANDLING IN SST)
Summary: Problem when reading Some xls file (RICH TEXT HANDLING IN SST)
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 1.5.1
Hardware: PC All
: P3 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 9448 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-04-01 11:02 UTC by Libin Roman
Modified: 2005-03-20 17:06 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Libin Roman 2002-04-01 11:02:40 UTC
Begin - 1017658823492
java.lang.IllegalArgumentException: Cannot store a duplicate value 
("???????????????") in this Map
        at net.sourceforge.poi.util.BinaryTree.insertValue(BinaryTree.java:1401)
        at net.sourceforge.poi.util.BinaryTree.put(BinaryTree.java:1586)
        at net.sourceforge.poi.hssf.record.SSTRecord.processString
(SSTRecord.java:1003)
        at net.sourceforge.poi.hssf.record.SSTRecord.manufactureStrings
(SSTRecord.java:930)
        at net.sourceforge.poi.hssf.record.SSTRecord.processContinueRecord
(SSTRecord.java:632)
        at net.sourceforge.poi.hssf.record.RecordFactory.createRecords
(RecordFactory.java:228)
        at net.sourceforge.poi.hssf.usermodel.HSSFWorkbook.<init>
(HSSFWorkbook.java:156)
        at TestFrame.<init>(TestFrame.java:56)
        at TestFrame.main(TestFrame.java:193)
Comment 1 Andy Oliver 2002-04-01 13:13:42 UTC
Please try using a CVS build.  I think this has been fixed.  The
net.sourceforge.x packages indicate this is 1.0.2 or an early development build
from the old sourceforge site.
Comment 2 Libin Roman 2002-04-01 13:33:14 UTC
Same old exception :(

Exception occurred during event dispatching:
java.lang.IllegalArgumentException: Cannot store a duplicate value 
("???????????????") in this Map
        at org.apache.poi.util.BinaryTree.insertValue(BinaryTree.java:1395)
        at org.apache.poi.util.BinaryTree.put(BinaryTree.java:1580)
        at org.apache.poi.hssf.record.SSTRecord.processString
(SSTRecord.java:1033)
        at org.apache.poi.hssf.record.SSTRecord.manufactureStrings
(SSTRecord.java:960)
        at org.apache.poi.hssf.record.SSTRecord.processContinueRecord
(SSTRecord.java:660)
        at org.apache.poi.hssf.record.RecordFactory.createRecords
(RecordFactory.java:176)
        at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
(HSSFWorkbook.java:140)
        at TestFrame.openFile(TestFrame.java:64)
Comment 3 Glen Stampoultzis 2002-04-23 05:41:18 UTC
Are you able to post the XLS file that causes this problem?  Does it contain 
rich text cells?
Comment 4 Andy Oliver 2002-04-23 11:41:53 UTC
perhaps for the moment we should kludge SST Record to notice duplicates and just
return the original in the event of a Rich Text cell.  I have to say this is one
feature I may consider the absence of irritating enough to sneak into 1.5.1.
Comment 5 Andy Oliver 2002-05-27 20:46:32 UTC
*** Bug 9448 has been marked as a duplicate of this bug. ***
Comment 6 Daniel Stephens 2002-05-28 01:45:07 UTC
Curiously enough although this appears to be a somewhat different problem than
my bug (9448), I discovered that copying a worksheet with lots of long and
similar strings in it (and invoking Excel's lovely cut-off-at-255-character
logic - how quaint), I get one which will cause the duplicate value problem.

As for my bug, the sheet probably has all manner of fields in it, it's the
result of several years of accumulated data from a variety of sources, and while
it contains essentially just lots of text, it's been cut and pasted in from all
sorts of things so who knows what the formatting is.

I've begun to suspect that there's more at work than just a flawed continuation
design though, because some instrumented code running against a 700K or so
spreadsheet came up with this:

manufactureStrings(index=18,size=8224)
manufactureStrings: Loop entry, remaining=8206
HMM: char_count = 39
setupStringParameters()
setupStringParameters: Initial total = 81
Processing string...
manufactureStrings: Loop entry, remaining=8125
HMM: char_count = 101
setupStringParameters()
setupStringParameters: Initial total = 104
setupStringParameters: Formatted run, total now = 103530
setupStringParameters: Extended, total now = 1946285934
HMM: total len bytes 1946285934 remaining is 8125 exp 1946277809 count 101

Where the 'extended' data is actually several orders of magnitude larger than
the file that supposedly contains it.. so Somewhere either the record is getting
corrupted or mis-parsed, I dont know which 8-(.

While I can't make my test data publically available, I will send something to
Glen which can hopefully assist in hunting down the glitch(es).
Comment 7 Andy Oliver 2002-05-28 01:55:14 UTC
Oh wow, maybe this one actually uses the ExtSSTRecord!  We never could find a
sheet that actually used/needed this.  Question: This is all western version of
Excel right?  (meaning a western european/US version of excel and/or western
european/US language)
Comment 8 Daniel Stephens 2002-05-28 02:00:26 UTC
Yup. US English everywhere (Though there are a goodly number of european accents
in there for good measure (in fact, whether you like it or not, excel seems to
take perverse pleasure in accenting Creme even when you dont want it to).
Comment 9 Andy Oliver 2002-05-28 02:02:35 UTC
Good that should work.  Can you do a org.apache.poi.hssf.dev.BiffViewer on it
and see if the ExtSSTRecord looks like it actually has something that looks
meaningful in it?  It sounds like this stuff may be just the kind of jackpot we
need to track that record down!  (I'm still guessing, this may prove to just be
more Rich Text issues)
Comment 10 Glen Stampoultzis 2002-05-28 03:29:11 UTC
Oh oh... looks like I didn't know what I would be getting myself in for 
here!  :-)
Comment 11 Glen Stampoultzis 2002-06-10 02:41:09 UTC
Fixed in 1.5 branch and in trunk.  Should now correctly read rich text and
extended text with no problems.  The offical release will be 1.5.1.