Bug 65676 - XSSFSheetXMLHandler inline string bug
Summary: XSSFSheetXMLHandler inline string bug
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2021-11-10 08:30 UTC by Alexander Herzog
Modified: 2021-11-10 10:35 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Herzog 2021-11-10 08:30:52 UTC
XSSFSheetXMLHandler returns wrong cell contents if an inline string cell follows directly on a number cell.

Minimal example:

try {
  /* Stream writing workbook to byte array stream
  (stream writer will use inline strings) */
  ByteArrayOutputStream output=new ByteArrayOutputStream();
  try(Workbook wb=new SXSSFWorkbook()) {
    Row r=wb.createSheet("Sheet").createRow(0);
    r.createCell(0).setCellValue(1.2); /* A1: Number 1.2 */
    r.createCell(1).setCellValue("ABC"); /* B1: Inline string "ABC" */

  /* Minimal stream reader processor */
  SheetContentsHandler reader=new SheetContentsHandler() {
    @Override public void startRow(int rowNum) {}
    @Override public void endRow(int rowNum) {}
    @Override public void cell(String cellReference,
    String formattedValue, XSSFComment comment) {
      System.out.println(cellReference+": "+formattedValue);
    @Override public void hyperlinkCell(String cellReference,
    String formattedValue, String url, String toolTip,
	XSSFComment comment) {}

  /* Stream reading workbook from byte array */
  try (OPCPackage xlsxPackage=OPCPackage.open(
  new ByteArrayInputStream(output.toByteArray()))) {
    XSSFReader xssfReader=new XSSFReader(xlsxPackage);
    try (InputStream stream=
    ((XSSFReader.SheetIterator)xssfReader.getSheetsData()).next()) {
      XMLReader sheetParser=XMLHelper.newXMLReader();
      sheetParser.setContentHandler(new XSSFSheetXMLHandler(
        new ReadOnlySharedStringsTable(xlsxPackage),
        new DataFormatter(),
      sheetParser.parse(new InputSource(stream));
} catch (IOException | OpenXML4JException | SAXException |
ParserConfigurationException e) {}

Expected output:

A1: 1.2

Actual output:

A1: 1.2
B1: 1.2ABC

Cause (as far as I understand the POI source code):

XSSFSheetXMLHandler is using a StringBuilder to collect chars before invoking SheetContentsHandler.cell(...). This is done in XSSFSheetXMLHandler.value (see line 111). After a cell it output, the StringBuilder has to be cleared (value.setLength(0)). This is not done when an inline string follows on a number cell. The cache is cleared at the beginning of a number cell and after an inline string cell; see lines 208 and 324. So in case A1=number, B1=inline_string, the content of A1 remains in the StringBuilder when adding content of B1.

Quick link to XSSFSheetXMLHandler:

Proposed fix:

Add "value.setLength(0);" after line 317.

I am not so familiar with the POI source code and internal details. So please check this carefully. Thank you very much!
Comment 1 PJ Fanning 2021-11-10 10:35:31 UTC
Thanks for looking into this. I'm surprised nobody has run into this yet. Fix applied using r1894902.