Bug 23083 - Creating Workbook from existing Excel file containing Rich Text will sometimes fail
Summary: Creating Workbook from existing Excel file containing Rich Text will sometime...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 1.5.1
Hardware: PC All
: P3 major with 4 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-10 20:31 UTC by Wayne Seguin
Modified: 2005-08-21 21:41 UTC (History)
0 users



Attachments
Testcase: Excel file 1. Control case. Works. (23.00 KB, application/octet-stream)
2003-09-11 19:04 UTC, Wayne Seguin
Details
Testcase: Excel file 2. File fails on open (all formatting runs are in Continue record) (23.00 KB, application/octet-stream)
2003-09-11 19:05 UTC, Wayne Seguin
Details
Testcase: Excel file 3. File fails to open. Some formatting run data in Continue record. (23.00 KB, application/octet-stream)
2003-09-11 19:07 UTC, Wayne Seguin
Details
This fixes the problem.Basically, it detects when the character part of the string is complete, and stores off the bytes to skip in the next Continue block to get to the start of the next string. (1.89 KB, patch)
2003-09-11 19:48 UTC, Wayne Seguin
Details | Diff
[PATCH] Sorry, it just occurred to me that I hadn't put in the "magic word" with the fix. Here's a patch that addresses the crashing. (1.89 KB, patch)
2003-09-15 00:40 UTC, Wayne Seguin
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wayne Seguin 2003-09-10 20:31:51 UTC
Rich Text is currently being stripped out, until it is fully supported. As 
such, the trailing formatting runs (immediately after the string chars) are 
skipped over to get to the start of the next string.

However, when the formatting run is separated, either completely or partially 
from is preceding string characters, by being flowed over into a Continue 
record, the code will behave unpredictable. This is because the code expects a 
Continue record following a SST record to begin with a string, either a 
continuation of an unfinished string from the prior record, or a new string 
entirely. It does not handle the situation where the record starts with 
formatting run (and likely Extended chars would cause same prob).

The earliest release problem found is 1.5.1 (I didn't look any further back. 
2.0-pre3 also still has problem since SSTDeserializer virtually hasn't changed 
at all.
Comment 1 Wayne Seguin 2003-09-11 19:04:32 UTC
Created attachment 8164 [details]
Testcase: Excel file 1. Control case. Works.
Comment 2 Wayne Seguin 2003-09-11 19:05:49 UTC
Created attachment 8165 [details]
Testcase: Excel file 2. File fails on open (all formatting runs are in Continue record)
Comment 3 Wayne Seguin 2003-09-11 19:07:27 UTC
Created attachment 8166 [details]
Testcase: Excel file 3. File fails to open. Some formatting run data in Continue record.
Comment 4 Avik Sengupta 2003-09-11 19:17:17 UTC
This is quite helpful.. just wanted some more details for the record

1. These xls files are created using which version of excel. 
2. What are you trying to do in POI... ie, does it fail just on new
HSSFWorkbook(..) ?

Thanks again
Comment 5 Wayne Seguin 2003-09-11 19:27:55 UTC
Sure. Excel 2000 (9.0.3821 SR-1).

To observe problem, simple create a POIFSFileSystem from a stream from the 
Excel file and build a workbook with it. The error occurs when reading the 
records.

	POIFSFileSystem fs = new POIFSFileSystem(stream);
	HSSFWorkbook workBook = new HSSFWorkbook(fs);

The first Excel file will work file (control case). The second doesn't error, 
but if you look at the list of strings, you'll be missing the last one. The 
third Excel file will throw a NegativeArraySizeException.
Comment 6 Andy Oliver 2003-09-11 19:39:48 UTC
This bug has been on the todo list for about 2 years now.  Blame me for it.  I've never gotten a 
customer to fund the work and its not much but SuperLink will pay a $100 (US - you pay any 
banking/shipping/transfer fees/paypal preferred) reward to the squasher of this bug/lack-of-
feature.  - Andy
Comment 7 Wayne Seguin 2003-09-11 19:48:45 UTC
Created attachment 8168 [details]
This fixes the problem.Basically, it detects when the character part of the string is complete, and stores off the bytes to skip in the next Continue block to get to the start of the next string.
Comment 8 Andy Oliver 2003-09-11 23:10:10 UTC
someone wrote to ask me whether I meant not crash or actually support RTF.  I meant actually 
support.  Consolation prize to whomever makes it not crash: $25.
Comment 9 Wayne Seguin 2003-09-15 00:40:13 UTC
Created attachment 8222 [details]
[PATCH] Sorry, it just occurred to me that I hadn't put in the "magic word" with the fix. Here's a patch that addresses the crashing.
Comment 10 Jason Height 2005-07-29 05:20:11 UTC
Hey Andy,

Still willing to fork out $100 for this! Sounds like a challenge. So far i can
decode a rich string...now to write it out again....

Jason

Comment 11 Andy Oliver 2005-07-29 15:31:48 UTC
You get this working (read/write) -- you betcha.

-andy
Comment 12 Jason Height 2005-08-22 05:41:53 UTC
HEAD now supports rich text. Read and Write. Doesnt suport Extended text bytes,
but i have no idea what there are, and i havent got a spreadsheet file that
uises them.

Patches not applied cause they were just workarrounds.

Im going to close this bug. Look forward to the prize!

Jason