Issue 63077 - Incorrect recognizing of text files with BOM (Byte-order-mask)
Summary: Incorrect recognizing of text files with BOM (Byte-order-mask)
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 2.0.2
Hardware: PC Windows, all
: P4 Trivial with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks: 62492
  Show dependency tree
 
Reported: 2006-03-12 12:35 UTC by intersol
Modified: 2017-05-20 11:25 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
unicode text file with BOM header (119 bytes, application/octet-stream)
2006-03-12 12:50 UTC, intersol
no flags Details
Office will save this document with doc extension. (13.52 KB, application/msword)
2006-05-11 13:44 UTC, intersol
no flags Details
example html file that shows BOM issue (202 bytes, text/html)
2008-03-17 01:54 UTC, ccheney
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description intersol 2006-03-12 12:35:04 UTC
Seams that OOo does not recognize text and XML documents with BOM
(Byte-Order-Mask). BOM is part of the unicode specification. More information
can be found on issue #62492.

Also I should mention that the BOM header should (or must?) be keeped on file
save. I think it would be better this too in order to avoid future issues on this.
Comment 1 intersol 2006-03-12 12:50:56 UTC
Created attachment 34790 [details]
unicode text file with BOM header
Comment 2 Olaf Felka 2006-03-13 08:25:17 UTC
Looks like a 'filter' thing.
Comment 3 svante.schubert 2006-03-14 17:58:07 UTC
Comment:
If you drag&drop your test text document into the Office, you may choose the
right encoding for the text file. I assume this issue is meant especially for
the text subset of XML documents, which might be identified by their tags.

General Technical Comments concerning BOM (from 62492):
These starting bytes in a document, which are called Byte-Order-Mark (BOM) are
used in documents encoded in Unicode to specify the Encoding Scheme. 

They are part of the Unicode specification:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G31703

There is a summary at
http://www.opentag.com/xfaq_enc.htm
Comment 4 intersol 2006-05-11 13:44:28 UTC
Created attachment 36400 [details]
Office will save this document with doc extension.
Comment 5 kpalagin 2007-03-28 20:36:16 UTC
Confirming with 2.2RC2 on WinXP - opening "unicode_text_with_bom.txt" 
causes "ASCII filter options" dialog to appear.
IE6 and NOTEPAD open the file "unicode_text_with_bom.txt" just fine.
Comment 6 lohmaier 2007-03-30 15:49:22 UTC
not reproducible on linux → Windows only
Comment 7 jack.warchold 2007-09-14 14:37:10 UTC
confirmed on windows with an src680_m228 build.

works fine in windows

reassigned to os
Comment 8 ccheney 2008-03-17 01:54:22 UTC
This also happens with HTML files. I have attached an example file which
displays garbage where the BOM is located in the file. Firefox properly displays
the file.
Comment 9 ccheney 2008-03-17 01:54:59 UTC
Created attachment 52141 [details]
example html file that shows BOM issue
Comment 10 ccheney 2008-03-17 01:57:12 UTC
Forgot to mention this still occurs on OOo 2.4.0~rc5 on Linux for the HTML file
example.
Comment 11 Marcus 2017-05-20 11:24:25 UTC
Reset assigne to the default "issues@openoffice.apache.org".
Comment 12 Marcus 2017-05-20 11:25:40 UTC
Reset assigne to the default "issues@openoffice.apache.org".