Bug 49653 - xlsx: line brake in Excel cell read as '_x00D_' in cell.getStringCellValue
xlsx: line brake in Excel cell read as '_x00D_' in cell.getStringCellValue
Status: RESOLVED FIXED
Product: POI
Classification: Unclassified
Component: XSSF
3.7-dev
PC All
: P2 normal (vote)
: ---
Assigned To: POI Developers List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2010-07-27 07:35 UTC by geraldh
Modified: 2010-07-28 01:52 UTC (History)
0 users



Attachments
The cell values of F3 and F4 aren't read correctly (8.03 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2010-07-27 07:35 UTC, geraldh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description geraldh 2010-07-27 07:35:42 UTC
Created attachment 25806 [details]
The cell values of F3 and F4 aren't read correctly

When reading a string cell value from excel (see attached xslx file) the line break is read as _x000D_.

The code below with the provided excel sheet can be used to reproduce this (3.7 Beta 1)

InputStream xlsInputStream = new FileInputStream(new File(
        "f:\\vfs\\UnReadableStrings.xlsx"));
    Workbook wb = new XSSFWorkbook(OPCPackage.open(xlsInputStream));
    
    Sheet sheet = wb.getSheetAt(wb.getActiveSheetIndex());
    Row row = sheet.getRow(2);
    Cell cell = row.getCell(5);
    System.out.println(cell.getStringCellValue());
    
    row = sheet.getRow(3);
    cell = row.getCell(5);
    System.out.println(cell.getStringCellValue());

Output: 
1行目_x000D_2行目
1行目_x000D_
2行目
Comment 1 Yegor Kozlov 2010-07-28 01:52:37 UTC
fixed in r979952

According to the OOXML spec, for all characters which cannot be represented in XML, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. In your case _x000D_ is converted into the carriage-return (\r) character. 

Yegor