Bug 49074

Summary: Creating Too Many Rich Strings Causes Excel 2007 Files to Lose All Formatting
Product: POI Reporter: Curtis Browning <brownicu>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: major    
Priority: P2    
Version: 3.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: See the comments at the top of the attached test driver program
Test Driver to Create the Sample Excel Files
64 rows created by POI code in the attached Driver.java
65 rows created by POI code in the attached Driver.java
64 rows created by driver.vbs script
65 rows created by driver.vbs script

Description Curtis Browning 2010-04-08 16:32:50 UTC
Created attachment 25245 [details]
See the comments at the top of the attached test driver program

Background:
In POI Bug 47543, we discovered that one must apply the base cell font to the cell itself via the cell style and not to the HSSFRichTextString object. Only the second and subsequent fonts used in HSSFRichTextStrings should use the HSSFRichTextString.applyFont() routine.

That approach works well up to a point, after which Excel 2003 and Excel 2007 will fail to read the font data, throwing user errors instead and showing only the raw sheet data without any formatting.

To reproduce, run the attached Driver.java program and try to open the generated file in Excel 2007. Excel throws an error stating that it found "unreadable content", and all cell formatting is then lost.

Note that in the test driver program we are creating 65 rows and 10 columns of cell data that contains footnote markers (a superscript character at the end of each cell text). If you change the row constant to 64 rows or lower, then Excel can read the generated content. If you use 65 rows or higher, the error described here will occur. 

Our customers typically generate large amounts of output containing many tables with footnotes in certain cells. When we reach the rich text limit described here their Excel output becomes useless and we receive technical support compaints.
Comment 1 Nick Burch 2010-05-05 12:40:36 UTC
Is 65x10 the smallest file you can generate that gives the problem, or can you find something smaller?

I'd suggest you create four files:
* One via POI, which is the smallest you can that gives the problem
* One via POI, which is like the above, but the biggest without issues
* One via Excel, containing the same data as the POI one that fails
* One via Excel, containing the same data as the POI one that works

Next, use BiffViewer to try to see how the files differ. Especially interesting will be between the two poi ones, and between the poi one that breaks and the excel equivalent. Use the poi+excel pair that both work to help identify harmless things that are done differently.

Hopefully from this analysis, you can identify what's going wrong with the file that POI generates
Comment 2 Curtis Browning 2010-07-01 15:20:38 UTC
Hi Nick,

In response to your questions:

1. Yes, the 65 row x 10 column file is the smallest file that I can create in POI that will demonstrate the problem.

2. I have attached 4 sample Excel workbooks, two created via POI (test_export_64_rows.xls and test_export_65_rows.xls), and two created via the attached driver.vbs script (vba_64_rows.xls and vba_65_rows.xls). The two 64-row files and the two 65-row files contain the exact same cell content.

3. I tried following the steps you requested I take with the BiffViewer application, but the two files created via the VBA script cause BiffViewer to crash with a General Protection Fault. I also tried to manually create the test file in Excel (slow and cumbersome as it was), but the same GPF occurred when opening the file in BiffViewer.

Please have a look at the attachments and the relevant POI code. Hopefully it will help you to get to the root cause of this problem.

Regards,
Curtis
Comment 3 Curtis Browning 2010-07-01 15:22:06 UTC
Created attachment 25679 [details]
Test Driver to Create the Sample Excel Files
Comment 4 Curtis Browning 2010-07-01 15:28:53 UTC
Created attachment 25680 [details]
64 rows created by POI code in the attached Driver.java
Comment 5 Curtis Browning 2010-07-01 15:29:20 UTC
Created attachment 25681 [details]
65 rows created by POI code in the attached Driver.java
Comment 6 Curtis Browning 2010-07-01 15:30:30 UTC
Created attachment 25682 [details]
64 rows created by driver.vbs script
Comment 7 Curtis Browning 2010-07-01 15:30:56 UTC
Created attachment 25683 [details]
65 rows created by driver.vbs script
Comment 8 Nick Burch 2010-09-21 06:54:33 UTC
I've been comparing the BiffViewer dumps from poi, excel and open office, and I can't spot anything wrong with what POI is doing. There are some differences, but generally when POI and Excel differ, open office does it the same way as POI, and when POI and OpenOffice differ, POI has done it similar to excel...

OpenOffice can read the poi 65 row file without error, so we're not doing anything too crazy

The only thing I can suggest is you ask around all of your clients, until you find one who have a support contract with Microsoft. Then, get the 65 row POI file submitted to Microsoft, and see if they can shed any light on what Excel dislikes about the file

As it stands, I can't figure out what POI is doing wrong to be able to fix it :/
Comment 9 Dominik Stadler 2015-05-31 21:27:25 UTC
Analysis by Nick some time ago indicates that no problem could be spotted in POI itself, thus resolving this issue for now, please reopen with more information if this is still a problem for you.