Apache OpenOffice (AOO) Bugzilla – Issue 63077
Incorrect recognizing of text files with BOM (Byte-order-mask)
Last modified: 2017-05-20 11:25:40 UTC
Seams that OOo does not recognize text and XML documents with BOM (Byte-Order-Mask). BOM is part of the unicode specification. More information can be found on issue #62492. Also I should mention that the BOM header should (or must?) be keeped on file save. I think it would be better this too in order to avoid future issues on this.
Created attachment 34790 [details] unicode text file with BOM header
Looks like a 'filter' thing.
Comment: If you drag&drop your test text document into the Office, you may choose the right encoding for the text file. I assume this issue is meant especially for the text subset of XML documents, which might be identified by their tags. General Technical Comments concerning BOM (from 62492): These starting bytes in a document, which are called Byte-Order-Mark (BOM) are used in documents encoded in Unicode to specify the Encoding Scheme. They are part of the Unicode specification: http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G31703 There is a summary at http://www.opentag.com/xfaq_enc.htm
Created attachment 36400 [details] Office will save this document with doc extension.
Confirming with 2.2RC2 on WinXP - opening "unicode_text_with_bom.txt" causes "ASCII filter options" dialog to appear. IE6 and NOTEPAD open the file "unicode_text_with_bom.txt" just fine.
not reproducible on linux → Windows only
confirmed on windows with an src680_m228 build. works fine in windows reassigned to os
This also happens with HTML files. I have attached an example file which displays garbage where the BOM is located in the file. Firefox properly displays the file.
Created attachment 52141 [details] example html file that shows BOM issue
Forgot to mention this still occurs on OOo 2.4.0~rc5 on Linux for the HTML file example.
Reset assigne to the default "issues@openoffice.apache.org".