Created attachment 27326 [details] Test case Hi all, I'm facing a problem opening a .XLS file (saved in excel 97-2003 format by Excel 2007) with euro characters. It prints a "?" (question mark) instead of the euro char "€". Googling over, I found that a comment of Douglas Atique at https://issues.apache.org/bugzilla/show_bug.cgi?id=30319#c10 pointed out at org.apache.poi.util.StringUtil.java . It's code shows clearly that it uses ISO-8859-1 instead of the newer ISO-8859-15. I think that it would be better to use the new coding. More info: * http://en.wikipedia.org/wiki/ISO/IEC_8859-15 Thanks, Alejandro.
Simple java code to dump the contents: private void dumpExcel(InputStream is) throws Exception { final HSSFSheet st = new HSSFWorkbook(new POIFSFileSystem(is)).getSheetAt(0); for (final Iterator<Row> ri = st.rowIterator(); ri.hasNext();) { final Row r = ri.next(); for (final Iterator<Cell> ci = r.cellIterator(); ci.hasNext();) { final Cell c = ci.next(); c.setCellType(Cell.CELL_TYPE_STRING); System.out.print(c.getStringCellValue() + '\t'); } System.out.println(); } }
It's not a question of what would be better, but what Excel itself does... Normally a string with a euro symbol in it will get stored as a unicode string, not an 8 bit one. Could you try creating some files with characters that are in ISO-8859-1 but not -15, and the other way around? We can then use those to try to see if Excel flags in some way when it's deciding to use one encoding or the other
Waiting for information since 2011, therefore I am resolving this for now, please reopen with some more sample files if this is still an issue for you.