This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | [67cat] UTF-8 files with signature (Byte Order Mark) not supported | ||
---|---|---|---|
Product: | projects | Reporter: | denbo <denbo> |
Component: | Generic Infrastructure | Assignee: | Tomas Stupka <tstupka> |
Status: | REOPENED --- | ||
Severity: | blocker | CC: | beany, csersoft, Dmitry.Sokolov, emi, kAlvaro, kitfox, lativ, matafagafo, maxym, mmetelka, ovrabec, protasovams, shirojirou, szd, tstupka |
Priority: | P3 | ||
Version: | 8.2 | ||
Hardware: | PC | ||
OS: | Windows 7 x64 | ||
Issue Type: | ENHANCEMENT | Exception Reporter: | |
Attachments: |
UTF-8+signature encoded .properties file
UTF-8+signature encoded .properties file as rendered in IDE example of displaying BOM in Netbeans Editor |
Description
denbo
2009-03-24 13:07:31 UTC
Created attachment 78742 [details]
UTF-8+signature encoded .properties file
Created attachment 78743 [details]
UTF-8+signature encoded .properties file as rendered in IDE
This in fact is a JDK problem, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058, which was closed as "Won't fix" due to compatibility reasons. The applications are left on their own to handle BOM in UTF-8 streams. Anyway Netbeans use encoding from a project and expects that files in the project are stored using this encoding (unless a file itself can't specify a different encoding, eg JSP, XML files can). So, from Netbeans point of view BOM in UTF-8 encoded files is superfluous and you don't have to use it at all. Thank you for explaining this. As so often I'm dealing with a production/legacy problem here. In an ideal world I could say "let's get rid of the signatures!" but I hit the reality wall and our customer's IT department insists on having the BOM/signature. Again, this leaves developer Denbo alone with the problem and I'm forced to edit my .properties files in Notepad2 instead of Netbeans :( There were similar issues against different type of files. One the most critical case had been fixed: issue 83321. Probably similar approach can be used in this case. Please explain what do you expect from NetBeans editor. (It looks like the best place for general support is openide.loaders module -> DataEditorSupport class.) I see 2 possible variants: a) Shall we use BOM to detect encoding for particular file? (Like it is done in UnicodeReader class in xml.core module.) b) Or is it enough just to use system or project encoding (which should be UTF-8 in this case) and skip BOM at reading and then prepend BOM on save? (In such case you cannot mix different encodings in one project.) It looks like variant a) is better (and safer). Yes, definitely a) is the better choice. *** Bug 183040 has been marked as a duplicate of this bug. *** *** Bug 185868 has been marked as a duplicate of this bug. *** b) is enough for me. If this is not possible, I have to search for other editor... Is it possible to use BOM only C++ mode? Is it possible to do it by plug in ? *** Bug 206511 has been marked as a duplicate of this bug. *** I can understand that JDK limits BOM support for Java files, however, NetBeans editor supports many more file types (for instance JavaScript) and default encoding cannot be specified for each file type. Even Windows Notepad inserts BOM into utf8 encoded files (txt or whatever). IMO the best place to handle this issue is a FileEncodingQuery so reassigning to queries module. (In reply to Miloslav Metelka from comment #16) > IMO the best place to handle this issue is a FileEncodingQuery so > reassigning to queries module. FileEncodingQuery itself only delegates to FileEncodingQueryImplementation services that can be in any module. as of now however it's not involved in handling the stream at all. It's passed a FileObject and returns encoding. If I understand it right, it would have to open the stream and read the BOM, return encoding value and close the stream. The FOQ api user would then load the file with the given encoding. However who would strip the BOM from the opened file content? and who would write the BOM at the beginning of the file when saving? I don't see this as defect but rather a missing feature, additionally I'm nit convinced this can be fixed in project system alone. (by implementing a BOM aware FileEncodingQueryImplementation service. >I don't see this as defect but rather a missing feature
It's a standard (albeit rare) file format the editor doesn't support. To me it doesn't get any more basic than reading and writing a text file properly so I would pencil it as a bug. Since it's a bit uncommon we could say it has a low priority instead?
Last I looked into this (loong ago) there were problems in the editor but also in the versioning module (where diff was also looking at the BOM to compute differences and removing the BOM in the editor marked the file as 'changed', etc).
(In reply to Milos Kleint from comment #18) > I don't see this as defect but rather a missing feature, additionally I'm > nit convinced this can be fixed in project system alone. (by implementing a > BOM aware FileEncodingQueryImplementation service. I also think it is rather bug. Like Emi said, it is basic file format. Another thing is, that lack of BOM support means that BOM is loaded as characters (actually as one character) into editor. Which is definitely a bug. > FileEncodingQuery itself only delegates to FileEncodingQueryImplementation
> services that can be in any module.
>
> as of now however it's not involved in handling the stream at all. It's
> passed a FileObject and returns encoding. If I understand it right, it would
> have to open the stream and read the BOM, return encoding value and close
> the stream. The FOQ api user would then load the file with the given
> encoding. However who would strip the BOM from the opened file content? and
> who would write the BOM at the beginning of the file when saving?
I thought that the Charset impl returned from FEQ would strip the BOM when decoding and possibly add the BOM when encoding. It could possibly maintain a static WeakSet<FileObject> of the files containing the BOM so that the BOM gets written back when saving. IMHO the BOM should not be present in the output produced by java.io.Reader that feeds the javax.swing.text.Document otherwise we could have problems with positions offsets correctness if e.g. a refactoring manipulates the files directly. The DataEditorSupport.loadFromStreamToKit() produces the Reader as
new InputStreamReader (stream, decoder);
where decoder is obtained from FEQ and IMHO all other content manipulation impls should go through the FEQ.
*** Bug 207898 has been marked as a duplicate of this bug. *** *** Bug 241478 has been marked as a duplicate of this bug. *** *** Bug 247881 has been marked as a duplicate of this bug. *** This old bug may not be relevant anymore. If you can still reproduce it in 8.2 development builds please reopen this issue. Thanks for your cooperation, NetBeans IDE 8.2 Release Boss Reopening - I created a javascript file, encoded as UTF-8 with BOM and while gedit and mousepad both open the file just fine, netbeans renders the BOM as whitespace. This test was done with the state of core-main as of 2016-07-09. Created attachment 164547 [details]
example of displaying BOM in Netbeans Editor
Confirming bug - created a js file (in Notepad++ with UTF-8 encoding). NetBeans displays BOM as a strange mark at the beginning of the file (see attachment of 2017-06-15). |