Bug 51572 - ISO-8859-15 support in StringUtils
Summary: ISO-8859-15 support in StringUtils
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.6-FINAL
Hardware: PC Windows XP
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-28 07:20 UTC by Alejandro Torras
Modified: 2015-03-22 19:35 UTC (History)
0 users



Attachments
Test case (16.50 KB, application/vnd.ms-excel)
2011-07-28 07:20 UTC, Alejandro Torras
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alejandro Torras 2011-07-28 07:20:49 UTC
Created attachment 27326 [details]
Test case

Hi all,

I'm facing a problem opening a .XLS file (saved in excel 97-2003 format by Excel 2007) with euro characters.
It prints a "?" (question mark) instead of the euro char "€".

Googling over, I found that a comment of Douglas Atique at https://issues.apache.org/bugzilla/show_bug.cgi?id=30319#c10 pointed out at org.apache.poi.util.StringUtil.java .
It's code shows clearly that it uses ISO-8859-1 instead of the newer ISO-8859-15.

I think that it would be better to use the new coding.

More info:
* http://en.wikipedia.org/wiki/ISO/IEC_8859-15


Thanks,
Alejandro.
Comment 1 Alejandro Torras 2011-07-28 07:24:40 UTC
Simple java code to dump the contents:

	private void dumpExcel(InputStream is) throws Exception {

		final HSSFSheet st = new HSSFWorkbook(new POIFSFileSystem(is)).getSheetAt(0);
		for (final Iterator<Row> ri = st.rowIterator(); ri.hasNext();) {
			final Row r = ri.next();
			for (final Iterator<Cell> ci = r.cellIterator(); ci.hasNext();) {
				final Cell c = ci.next();
				c.setCellType(Cell.CELL_TYPE_STRING);
				System.out.print(c.getStringCellValue() + '\t');
			}
			System.out.println();
		}
	}
Comment 2 Nick Burch 2011-07-28 10:50:56 UTC
It's not a question of what would be better, but what Excel itself does...

Normally a string with a euro symbol in it will get stored as a unicode string, not an 8 bit one.

Could you try creating some files with characters that are in ISO-8859-1 but not -15, and the other way around? We can then use those to try to see if Excel flags in some way when it's deciding to use one encoding or the other
Comment 3 Dominik Stadler 2015-03-22 19:35:04 UTC
Waiting for information since 2011, therefore I am resolving this for now, please reopen with some more sample files if this is still an issue for you.