Bug 62444 - Newline character in table header row cell causes damage to Excel file
Summary: Newline character in table header row cell causes damage to Excel file
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.17-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-09 16:27 UTC by David Gauntt
Modified: 2019-05-09 23:35 UTC (History)
0 users



Attachments
Testcase for bug 62444 (6.86 KB, text/plain)
2019-05-07 12:55 UTC, David Gauntt
Details
Zip archive containing results of testcase (20.39 KB, application/x-zip-compressed)
2019-05-07 13:07 UTC, David Gauntt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Gauntt 2018-06-09 16:27:55 UTC
Overview
If there is a newline (or perhaps a carriage return) in the text of a header cell in a table in an XSSFSheet, when the workbook is written out by POI and then read by Excel, Excel reports some unreadable content in the workbook.

Steps to Reproduce

1) Use Excel to prepare a workbook that has a table in the first sheet.
2) Select a header cell in the table, and enter text containing a new-line in
the cell.  In Excel 2011 for Mac, this is done with the keystroke combination cmd-option-return.
3) Save the workbook.
4) Read and then write the workbook using the following POI code:

	final File testFile = new File(srcFile.getParentFile(), "test.xlsx");
	final InputStream is = new FileInputStream(srcFile);
	final XSSFWorkbook wb = (XSSFWorkbook) WorkbookFactory.create(is);
	is.close();
	try (OutputStream fileOut = new FileOutputStream(testFile)) {
		wb.write(fileOut);
		wb.close();
	}
5) Open the workbook in Excel.  You will get warning message about repairing unreadable content; when you examine the repair log file, you will get the following information:

Repair Result to test 04744.xml
Errors were detected in file ‘[path redacted]:test.xlsx'
Repaired Records: Table from /xl/tables/table1.xml (List)

This has been seen in version 3.17 of POI, on Macintosh running OS X 10.11.6, using Excel 2011 for Mac
Comment 1 Dominik Stadler 2018-06-28 13:52:54 UTC
Can you attach such a sample Excel file that can be used to reproduce the problem?
Comment 2 David Gauntt 2019-05-07 12:55:52 UTC
Created attachment 36573 [details]
Testcase for bug 62444

This JUnit test class illustrates the problem.  It writes 4 files to build/custom-reports-test (JUnit reports 4 successes).  All contain a 4 column, 3 row table, and differ only in the data in the first column.  I am also attaching all 4 file, as well as testLfInHeader.repaired.xlsx, which is a copy of testLfInHeader.xlsx after it has been repaired by Excel 2010.

testNoLfInHeader.xlsx contains no newline characters in any headers.  Excel opens this file silently.

testLfInHeaderCell.xlsx contains a newline character in the first header cell, but no newline character in the XSSFTable column name.  Excel 2010 opens this file silently, but the first header cell shows no sign of a newline.

testLfInHeaderCell.xlsx contains no newline character in the first header cell, but has a newline character in the XSSFTable column name.  Excel 2010 opens this file silently, but the first header cell shows no sign of a newline.

testLfInHeaderCell.xlsx contains a newline character in the first header cell, andhas a newline character in the XSSFTable column name.  Excel 2010 opens this file after warning that the table needs repair, and the first header cell is 2 lines high.
Comment 3 David Gauntt 2019-05-07 13:07:29 UTC
Created attachment 36574 [details]
Zip archive containing results of testcase

This zip archive contains the 4 files described in the test case comment, plus one of the files after it has been repaired by Excel 2010.

The repair changes the text in the file xl/tables/table1.xml inside the xlsx archive.  It adds several xmlns attributes to the table tag, and an xr3:uid tag to each tableColumn tag.  The contents of table1.xml are listed below.

Before the repair, table1.xml has the following content:

<?xml version="1.0" encoding="ISO-8859-1"?>
<table id="1" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" name="Table1" displayName="Table1" ref="C3:F5">
<tableColumns count="4">
<tableColumn id="1" name="Col 1"/>
<tableColumn id="2" name="Col2"/>
<tableColumn id="3" name="Col3"/>
<tableColumn id="4" name="Col4"/>
</tableColumns>
<tableStyleInfo name="TableStyleMedium2" showLastColumn="false" showFirstColumn="false" showRowStripes="true" showColumnStripes="false"/>
</table>

After the repair, table1.xml contains the following text:

<?xml version="1.0" encoding="ISO-8859-1"?>
<table id="1" ref="C3:F5" displayName="Table1" name="Table1" xr:uid="{00000000-000C-0000-FFFF-FFFF00000000}" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" mc:Ignorable="xr xr3" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<tableColumns count="4">
<tableColumn id="1" name="Col 1" xr3:uid="{00000000-0010-0000-0000-000001000000}"/>
<tableColumn id="2" name="Col2" xr3:uid="{00000000-0010-0000-0000-000002000000}"/>
<tableColumn id="3" name="Col3" xr3:uid="{00000000-0010-0000-0000-000003000000}"/>
<tableColumn id="4" name="Col4" xr3:uid="{00000000-0010-0000-0000-000004000000}"/>
</tableColumns>
<tableStyleInfo name="TableStyleMedium2" showColumnStripes="0" showRowStripes="1" showLastColumn="0" showFirstColumn="0"/>
</table>