100112 – UTF-8 encoded text file opens with system encoding when if there is no BOM

Issue 100112 - UTF-8 encoded text file opens with system encoding when if there is no BOM

Summary: UTF-8 encoded text file opens with system encoding when if there is no BOM

Status:	REOPENED

Alias:	None

Product:	General
Classification:	Code
Component:	ui (show other issues)
Version:	current
Hardware:	PC Windows Vista

Importance:	P3 Trivial (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-03-11 08:28 UTC by jeongkyu.kim
Modified:	2013-01-29 21:43 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
A sample UTF-8 encoded text file which has no BOM (45.16 KB, text/plain) 2009-03-11 08:30 UTC, jeongkyu.kim	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description jeongkyu.kim 2009-03-11 08:28:04 UTC

There was a bug report through Korean community forum that a utf-8 encoded text
file was displayed broken when opened in OO.o. I checked the file with binary
editor and found that the file has no BOM, and it caused OO.o to apply system
encoding (in this case, MS949) instead of utf-8. As far as I know, BOM is not
mandatory for utf-8 encoded text file, then OO.o should handle the case in
better way.

I understand that it is possible to force a specific encoding when user opens
text file in OO.o. However, for plain users who do not have idea on encoding,
that would be rather difficult thing.

The possible solutions I can think of are

1. Letting user choose encoding when file is opened. This was already
implemented for the text file without extension, and it might be not bad to
apply the same thing for the utf-8 encoded text file without BOM.

2. Implementing auto detection of encoding like in firefox.

Comment 1 jeongkyu.kim 2009-03-11 08:30:52 UTC

Created attachment 60877 [details]
A sample UTF-8 encoded text file which has no BOM

Comment 2 Olaf Felka 2009-03-11 08:45:01 UTC

@ sba: Please  have a look.

Comment 3 stefan.baltzer 2009-05-18 15:04:11 UTC

SBA: When choosing file format "Text (encoded)" in the file open dialog, there
is an ecoding selection dialog coming up. 
 - In that one, choose "UTF-8"
-> File opens fine.

Comment 4 jeongkyu.kim 2009-05-18 17:43:43 UTC

@sba: I already knew that it works fine when users explicitly choose the correct
filter. However, I'd like to address the default behavior for the text file
without BOM in this issue.

The points of this issue are
- It is not mandatory to have BOM in utf-8 encoded text file.
- If there is no BOM in utf-8 encoded text file, OO.o applies system encoding by
default.
- When system's encoding is not UTF-8 (which is the most cases for Windows
XP/Vista), the files are decoded incorrectly.

I believe the best solution is to figure out the correct encoding from the data.
However, if it is or technically hard or not feasible to implement then please
consider a work around at least.

For example, the default behavior for the 'text file without extension' is
asking users to choose encoding. What about implementing the same thing for the
'text file without BOM'? It is certainly better then just displaying broken
characters.