Bug 58775 - Excel reports corrupt workbook if workbook has more than 205 custom DataFormats
Summary: Excel reports corrupt workbook if workbook has more than 205 custom DataFormats
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-28 07:11 UTC by Javen O'Neal
Modified: 2015-12-29 05:56 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Javen O'Neal 2015-12-28 07:11:35 UTC
Unit test (adapted from [1])

public void createManyDataFormats(int numDataFormats) {
    Workbook wb = new XSSFWorkbook();
    Sheet sh = wb.createSheet("test");
    DataFormat formats = wb.createDataFormat();
    for (int i=0; i<numDataFormats; i++) {
        Row row = sh.createRow(i);
        Cell cell = row.createCell(0);
        CellStyle style = wb.createCellStyle();
        String formatStr = "\"Format" + i + "\"0.0";
        int formatIndex = formats.getFormat(formatStr);
        style.setDataFormat(formatIndex);
        cell.setCellStyle(style);
        cell.setCellValue(i);
    }
    String filename = "/tmp/dataformat-" + numDataFormats +".xlsx";
    OutputStream fos = new FileOutputStream(filename);
    wb.write(fos);
    fos.close();
    wb.close();
}

Excel 2013 (en-US) doesn't report any issues opening dataformat-100.xlsx, but upon opening dataformat-250.xlsx, it reports the following error message:

"We found a problem with some content in 'dataformat-250.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
Upon clicking Yes, it looks like all formats beyond 205 were ignored.

According to [2], Microsoft Excel 2007+ supports between 200 and 250 "number formats in a workbook", depending on the language version of Excel. POI should restrict workbooks to 250 data formats, with notes in the documentation that Excel may not be able to read workbooks containing over 200 data formats.

[1] http://poi.apache.org/spreadsheet/quick-guide.html#DataFormats
[2] https://support.office.com/en-us/article/Excel-specifications-and-limits-ca36e2dc-1f09-4620-b726-67c00b05040f#bmworksheetworkbook
Comment 1 Javen O'Neal 2015-12-29 05:51:01 UTC
Added in r1722054.
Updated docs in r1722056.

The default upper limit of 250 data formats per style table can be changed by the user via StylesTable.setMaxNumberOfDataFormats.
Comment 2 Javen O'Neal 2015-12-29 05:56:14 UTC
Backwards-compatibility note: r1722054 changed the signature of some methods to use short instead of int to represent the number format id. The generated ooxml CTNumFmt stores the number format id as a long, though Excel limits the quantity of number formats to 200-250.

This implies either number format ids can be sparsely populated among the domain of longs or they left the format open so that the upper limit may increase in a future version of Excel.