This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 94676 - Address file encoding issues in new templating system
Summary: Address file encoding issues in new templating system
Alias: None
Product: platform
Classification: Unclassified
Component: Data Systems (show other bugs)
Version: 6.x
Hardware: All All
: P2 blocker (vote)
Assignee: Jaroslav Tulach
Keywords: I18N
Depends on: 42638
Blocks: 13250 97848
  Show dependency tree
Reported: 2007-02-06 17:26 UTC by Jesse Glick
Modified: 2008-12-22 11:42 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:

scripting uses FEQ, UTF-8 is default encoding on SFS (14.75 KB, patch)
2007-03-21 17:42 UTC, Jaroslav Tulach
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Glick 2007-02-06 17:26:38 UTC
JG05 in issue #13250, edited a bit:

Use of platform default encoding for OutputStreamWriter and InputStreamReader
(in ScriptingCreateFromTemplateHandler.createFromTemplate) is dangerous because
either the template or the output file (or both) might require an encoding
different from the platform default encoding. For example, a Mexican Windows
user named "Raúl" with encoding set to Cp1252 tries to instantiate an XML
template shipped with the IDE:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on ${date} by ${user} -->

Decoding the template in Cp1252 is in this case harmless, and substitution
inside the script engine probably proceeds without issue. But when the result is
then written in Cp1252, his name becomes garbage in UTF-8, possibly even making
the file malformed (causing parse errors).

Issue #42638 proposes something like

public static String FileEncodingQuery.getEncoding(FileObject);

which if available should be used for both the reading and writing stages.

Note that there is a subtlety in the writing part: if you create an empty *.xml
file and then write the String

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><!-- Created by Raúl --><root/>"

then the encoder would actually need to scan the first part of the content to
see what encoding should be used for the remainder. This cannot be done with a
simple String return value, I think.
Comment 1 Tomas Zezula 2007-02-12 15:43:52 UTC
I agree, I changed the FileEncodingQuery and FileEncodingQueryImplementation to
return an Charset which can be subclassed.

The new from template should do:
FileObject template;
FileObject destDir;

Charset inenc = FileEncodingQuery.getEncoding (template);
FileObject newFile = destDir.createData(name,ext);
Charset outenc = FileEncodingQuery.getEncoding (newFile);

Reader in = new Reader (template.getInputStream(),inenc);
Writer out = new Writer (newFile.getOutputStrea(lck),outenc);
copy (in,out)

The Charset returned by the XML's FileEncoingQueryImplementation has to have
an encoder and decoder which finds out the correct encoding as Jesse described
in issue #42638 (Tue Feb 6 17:50:19 +0000 2007) [JG04] point 2.
Comment 2 Jaroslav Tulach 2007-03-21 17:42:25 UTC
Created attachment 39769 [details]
scripting uses FEQ, UTF-8 is default encoding on SFS
Comment 3 Jaroslav Tulach 2007-03-21 17:44:17 UTC
Tomáš, Jesse, is this what you wanted me to integrate?
Comment 4 Tomas Zezula 2007-03-21 17:54:56 UTC
Seems good to me. The template system uses correctly encoding when reading
template as well as in writing it. The default encoding for default filesystem
is UTF-8.
Comment 5 Jesse Glick 2007-03-21 22:34:09 UTC
Looks OK to me.

In FEQT, don't you mean "UTF-8" rather than "utf-8"? AFAIK the canonical name is
Comment 6 Jaroslav Tulach 2007-03-22 09:25:05 UTC
Ok, so I'll take this as an approval and commit it, for M8. It has anyway been 
discussed during the review of issue 13250 and issue 42638.
Comment 7 Jaroslav Tulach 2007-03-22 10:21:03 UTC
"#94676: Using FileEncodingQuery in scripting"

Checking in openide/templates/nbproject/project.xml;
/shared/data/ccvs/repository/openide/templates/nbproject/project.xml,v  <--  
new revision: 1.3; previous revision: 1.2
Checking in 
new revision: 1.3; previous revision: 1.2
Checking in 
new revision: 1.4; previous revision: 1.3
file: /shared/data/ccvs/repository/openide/templates/test/unit/src/org/netbeans/modules/templates/utf8.xml,v
Checking in 
<--  utf8.xml
initial revision: 1.1
Checking in projects/queries/apichanges.xml;
/shared/data/ccvs/repository/projects/queries/apichanges.xml,v  <--  
new revision: 1.9; previous revision: 1.8
Checking in projects/queries/;
/shared/data/ccvs/repository/projects/queries/,v  <--
new revision: 1.13; previous revision: 1.12
Checking in 
new revision: 1.3; previous revision: 1.2
Checking in 
new revision: 1.3; previous revision: 1.2
Checking in ide/golden/deps.txt;
/shared/data/ccvs/repository/ide/golden/deps.txt,v  <--  deps.txt
new revision: 1.486; previous revision: 1.485
Comment 8 tprochazka 2007-03-23 20:55:19 UTC
I tested templates in NB build 200703221900.

I created project with UTF-8 encoding.
Open Java Class template and put ěščřžýáí chars to it.
Create new Class file.
ěščřžýáíé is OK
but __DATE__ is 23. b�ezen 2007 ( I'm using cz_cs locale)

I switch project to Windows-1250
I create new Class
ěščřžýáíé is created as: ěščřžýáíé
but __DATE is corectly displayed as 23. březen 2007

Are these bugs related to this issue?

I don't understand it. Why NB doesn't use for all internall operation unicode? 
Only when user load or save .java file, NB take conversion from/to unicode.
Comment 9 Jaroslav Tulach 2007-03-23 22:13:25 UTC
I guess you found a bug. There may be some subtleties around our current impl. 
Please report new bug, with steps how to reproduce it (using cs_CZ is perfect, 
that is mine encoding as well).
Comment 10 tprochazka 2007-03-24 11:59:17 UTC
OK. I created new issue: