Bug 49653

Summary: xlsx: line brake in Excel cell read as '_x00D_' in cell.getStringCellValue
Product: POI Reporter: geraldh <geraldspam>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.7-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: The cell values of F3 and F4 aren't read correctly

Description geraldh 2010-07-27 07:35:42 UTC
Created attachment 25806 [details]
The cell values of F3 and F4 aren't read correctly

When reading a string cell value from excel (see attached xslx file) the line break is read as _x000D_.

The code below with the provided excel sheet can be used to reproduce this (3.7 Beta 1)

InputStream xlsInputStream = new FileInputStream(new File(
        "f:\\vfs\\UnReadableStrings.xlsx"));
    Workbook wb = new XSSFWorkbook(OPCPackage.open(xlsInputStream));
    
    Sheet sheet = wb.getSheetAt(wb.getActiveSheetIndex());
    Row row = sheet.getRow(2);
    Cell cell = row.getCell(5);
    System.out.println(cell.getStringCellValue());
    
    row = sheet.getRow(3);
    cell = row.getCell(5);
    System.out.println(cell.getStringCellValue());

Output: 
1行目_x000D_2行目
1行目_x000D_
2行目
Comment 1 Yegor Kozlov 2010-07-28 01:52:37 UTC
fixed in r979952

According to the OOXML spec, for all characters which cannot be represented in XML, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. In your case _x000D_ is converted into the carriage-return (\r) character. 

Yegor