Bug 49654 - unreadble content in xls
Summary: unreadble content in xls
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.6-FINAL
Hardware: PC Windows XP
: P5 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-27 08:19 UTC by Bryan Tong
Modified: 2011-06-25 12:29 UTC (History)
0 users



Attachments
POI output (40.46 KB, application/octet-stream)
2010-07-30 00:38 UTC, Bryan Tong
Details
input the same url link mannually (70.97 KB, application/octet-stream)
2010-07-30 00:39 UTC, Bryan Tong
Details
VBA script to create workbook with hyperlinks (789 bytes, application/octet-stream)
2010-08-03 08:36 UTC, Yegor Kozlov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bryan Tong 2010-07-27 08:19:43 UTC
The example below comes from the POI website and I make some minor changes. By running it,we can get the error "unreable content in xls"

public class POIExport{
 public static void main(String[] args) throws IOException{
  Workbook wb = new HSSFWorkbook(); //or new HSSFWorkbook();
     CreationHelper createHelper = wb.getCreationHelper();

     //cell style for hyperlinks
     //by default hypelrinks are blue and underlined
     CellStyle hlink_style = wb.createCellStyle();
     Font hlink_font = wb.createFont();
     hlink_font.setUnderline(Font.U_SINGLE);
     hlink_font.setColor(IndexedColors.BLUE.getIndex());
     hlink_style.setFont(hlink_font);

     
     Sheet sheet = wb.createSheet("Hyperlinks");
     //number of rows
     for (int i = 0; i < 14000 ; i++){
      Row row = sheet.createRow(i);
      //number of columns 
      for(int j = 0 ; j < 5; j++){
       Cell cell = row.createCell((short)j);
       cell.setCellValue("URL Link");
       Hyperlink link = createHelper.createHyperlink(Hyperlink.LINK_URL);
       link.setAddress("http://poi.apache.org/");
       cell.setHyperlink(link);
       cell.setCellStyle(hlink_style);
      }
     }
     FileOutputStream out = new FileOutputStream("hyperinks.xls");
     wb.write(out);
     out.close();
 }
}

It seems that if the number of columns with Hyperlink is getting more,we will get this error(check above example).but if we limit the rows and columns to a small number,It does not have problem.
Comment 1 Bryan Tong 2010-07-27 20:45:13 UTC
sorry,It should be the "unreadable" and I wrongly spelled it. 

I have tested poi-3.5-beta5-20090219 and poi-3.6-20091214 and these two version produces the same error.
Comment 2 Yegor Kozlov 2010-07-29 02:38:23 UTC
Please don't abuse Bugzilla. Changing the priority of a bug to Critical will not help us to fix it sooner.

If you are really interested in fixing this issue - you are welcome to do a research. If you need a guidance, please ask, we will be happy to help.

Yegor
Comment 3 Bryan Tong 2010-07-29 03:19:59 UTC
I am sorry for that.
as currently,our project has encountered this issue and we need to solve it quickly,but currently,we find not solution.

we try to use setformula(String) ,but the problem is that we cannot style the cell value(part of the string).1ABCHHHHH,we only want to color the CH,but after research,we found the current POI also doesnot support this.

Thanks for your reply.
Comment 4 Bryan Tong 2010-07-29 04:21:31 UTC
we are trying to use the following way instead.

cell.setCellFormula("HYPERLINK(\"" + "http://www.google.ca"+ "\",\"" + "Goolge" + "\")");   


but we have to color part of the "Goolge" string ,suppose ool,It seems that POI also doesnot support it.By this way ,the "unreadable content in xls" error will disappear, but It cannot color part of a string.

Could you give us some advice on how to solve it.

Thanks a lot.
Comment 5 Nick Burch 2010-07-29 06:56:51 UTC
I would suggest that you create two simple files:
* one in POI, with a single cell containing your problem hyperlink
* one in Excel, with a single cell containing your problem hyperlink

Then, use org.apache.poi.hssf.dev.BiffViewer to figure out what Excel has done differently to POI. Once you know that, you're hopefully most of the way to knowing what the POI fix will be.

If the problem is very very urgent for you, then there are people (not myself!) who can offer you paid-for support for POI. Otherwise, you'll need to do some of the work yourself, sorry...
Comment 6 Bryan Tong 2010-07-29 08:32:32 UTC
I have done what you suggest,but could not understand the output.

Thanks all the same.
Comment 7 Nick Burch 2010-07-29 09:25:38 UTC
You want to figure out how the two files differ

Ignoring the file offsets, how does the BiffViewer output of the two files compare? Which record(s) are in one but not the other? Which record(s) are different between them?
Comment 8 Yegor Kozlov 2010-07-29 14:23:39 UTC
Bryan,

It looks to be a limitation of Excel - the number of hyperlinks in a workbook cannot exceed 65534. I didn't find it in the spec but confirmed it empirically. If your code inserts 65533 hyperlinks then the workbook is OK, if the number of hyperlinks is greater or equal to 65534 then the workbook is unreadable. 

I guess Excel tracks the number of hyperlinks as unsigned short and when this number rolls over the workbook becomes corrupted. 

Yegor
Comment 9 Bryan Tong 2010-07-29 18:39:12 UTC
Yegor ,

From the begging ,we also suppose that maybe the reason .but after research,I try to add over 180000 of Hyperlink in the Excel manually and find no problem ,only by POI , It seems that number of HyperLinks is limited,so we think It 's a bug there.

However,for one week ,we still find not way to solve it.
Comment 10 Bryan Tong 2010-07-29 23:16:18 UTC
I have done research and find that for POI 3.5 and 3.6 version,one sheet of a workbook,It can only have 65530 Hyperlink cells. but for Microsoft Excel, there is not such limitation.
Comment 11 Bryan Tong 2010-07-30 00:32:26 UTC
Hi Nick,

I have used the POI  

cell.setCellValue("URL Link");
    		    Hyperlink link = createHelper.createHyperlink(Hyperlink.LINK_URL);
    		    link.setAddress("http://poi.apache.org/");
Comment 12 Bryan Tong 2010-07-30 00:38:19 UTC
Created attachment 25817 [details]
POI output
Comment 13 Bryan Tong 2010-07-30 00:39:09 UTC
Created attachment 25818 [details]
input the same url link mannually
Comment 14 Bryan Tong 2010-08-03 04:26:47 UTC
Hi Yegor Kozlov,

I have read the source of Hyperlink related files and could not understand where the number of Hyperlinks are limited.Could you please give me some advice so that I can continue.

int eofLoc = _sheet.getSheet().findFirstRecordLocBySid( EOFRecord.sid );
_sheet.getSheet().getRecords().add( eofLoc, link.record );


meanwhile,what doesnot rhe eofLoc exactly mean? Thanks a lot if you can explain more to me.
Comment 15 Yegor Kozlov 2010-08-03 08:29:45 UTC
> 
> I have read the source of Hyperlink related files and could not understand
> where the number of Hyperlinks are limited.Could you please give me some advice
> so that I can continue.
>

The number of hyperlinks is not limited in the Excel binary format, it seems to be a limitation of MS Excel application. I searched the format spec and MSDN and did not find this limitation. 
 
> int eofLoc = _sheet.getSheet().findFirstRecordLocBySid( EOFRecord.sid );
> _sheet.getSheet().getRecords().add( eofLoc, link.record );
> 
> 
> meanwhile,what doesnot rhe eofLoc exactly mean? Thanks a lot if you can explain
> more to me.

Excel workbook consists of records and the order of records is determined by the Excel binary format. Hyperlink records are appended to the sheet records and EOFRecord marks the end of the sheet. 

I think this code can be improved. EOFRecord seems to be always the last record in the collection  and a better version would look as follows:

        List<RecordBase> records = _sheet.getSheet().getRecords();
        int eofLoc = records.size() - 1;
        records.add( eofLoc, link.record );

This should significantly improve performance because findFirstRecordLocBySid takes O(N) operations to find a record in a sheet consisting of N records. 

I'm going commit the fix soon.

Regards,
Yegor
Comment 16 Yegor Kozlov 2010-08-03 08:35:02 UTC
> 
> From the begging ,we also suppose that maybe the reason .but after research,I
> try to add over 180000 of Hyperlink in the Excel manually and find no problem
> ,only by POI , It seems that number of HyperLinks is limited,so we think It 's
> a bug there.
> 
> However,for one week ,we still find not way to solve it.


How did you create that file with over  180000 of hyperlinks? 

I wrote a small VBA script identical to your Java sample. It crashes if the total number of created hyperlinks is greater than 64K. Can you run it on your machine and see if it works for 180000 links?


Regards,
Yegor
Comment 17 Yegor Kozlov 2010-08-03 08:36:16 UTC
Created attachment 25832 [details]
VBA script to create workbook with hyperlinks
Comment 18 Bryan Tong 2010-08-03 21:54:31 UTC
(In reply to comment #17)
> Created an attachment (id=25832) [details]
> VBA script to create workbook with hyperlinks

I have run this script and It also crashed.

for the 180000 hyperlinks, I  input several mannually and then copy paste,then I save the workbook and reopen it again,It has no problem.That means that MS Excel can store many hyperlinks.

It seems only by program which generates the excel file,It has such limitation,I have tries the JXL excel export and It has the same problem.
Comment 19 Yegor Kozlov 2011-06-25 12:29:20 UTC
I'm resolving it as WONTFIX. 

I don't see how we can fix this issue - POI is correct in terms of writing the .xls format. The number of hyperlinks is somehow limited in Excel, if you find any information that may help to overcome this limitation in POI, please post it into this (or new) issue.

I can see two workarounds:
 (a) use the HYPERLINK function instead of the Hyperlink object. 
 (b) try .xlsx format

Yegor