Bug 10024

Summary: <import> uses ContentEncoding for java character set
Product: Taglibs Reporter: Gael Stevens <gael.stevens>
Component: Standard TaglibAssignee: Tomcat Developers Mailing List <dev>
Status: CLOSED LATER    
Severity: major    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Fixes 8bit encoding error for AbsoluteFtp.jsp (20020514)

Description Gael Stevens 2002-06-19 18:29:25 UTC
The problem shows up when running AbsoluteFTP.jsp.  In our environment,
 the when following bit of code is extecuted:

                     String responseAdvisoryEncoding =
                         uc.getContentEncoding();
                     if (responseAdvisoryEncoding != null)
                         r = new InputStreamReader(i,
                         responseAdvisoryEncoding);
                     else
                         r = new InputStreamReader(i, DEFAULT_ENCODING);

 The responseAdvisoryEncoding is 8bit, which is not a legal
 characterSet for the InputStreamReader, and
 a javax.servlet.jsp.JspException: 8bit is eventually thrown.

 One workaround is to put a try catch around it, and use the default
 encoding, as below.

                    String responseAdvisoryEncoding =
                       uc.getContentEncoding();
                    if (responseAdvisoryEncoding != null)
                       try { // contentEncoding can be 8bit, not a java encoding
                         r = new InputStreamReader(i,
                           responseAdvisoryEncoding);
                       } catch (java.io.UnsupportedEncodingException ex){
                         r = new InputStreamReader(i, DEFAULT_ENCODING);
                       }
                     else
                         r = new InputStreamReader(i, DEFAULT_ENCODING);

 The basic issue is that content encoding, does not necessarily map to a
 java character encoding.  Im using jdk 1.3.1, so the new java nio Charset is
 not available.
Comment 1 Gael Stevens 2002-06-22 01:58:55 UTC
Created attachment 2154 [details]
Fixes 8bit encoding error for AbsoluteFtp.jsp (20020514)
Comment 2 Gael Stevens 2002-06-22 02:07:37 UTC
It may be that the charset from the ContentType is what you want, rather than
the ContentEncoding to create the InputStreamReader.  If so, then that
attached diff file may be of some help.   The charset attribute of the content
type (if present) provides a good mapping (earlier jdk versions had some issues
with IANNA's TIS-620 v.s java's TIS620, don't know if it's fixed in a later
jdk).  The uc.getContentEncoding() really doesn't relate to the java encoding
parameter (jdk 1.3) of the InputStreamReader's constructor.
Comment 3 Gael Stevens 2002-06-23 16:13:21 UTC
It may be that the problem is with the example, AbsoluteFtp.jsp. As per the
spec, 7.4 under Character Encoding : 
  Note that the charEncoding attribute should normally only be required when
  accessing absolute URL resources where the protocol is not HTTP, and where the
  encoding is not ISO-8859-1.

If so, then the example should include the charEncoding attribute.  Perhaps
a clarification of the spec is needed here.  The above section also says :

  If the response has content encoding information (e.g.
  URLConnection.getContentEncoding() has a non null value), then the
  character encoding specified is used.

In the case of the URLConnection.getContentEncoding() returning 8bit, which is 
of course, non null and also not a valid java character encoding, what should be
the result?  This case is not covered in the error section under For External 
Resources, as the URLConnection class does not throw an exception.  
Comment 4 Justyna Horwat 2002-06-27 22:22:27 UTC
This is an issue that needs to be resolved in the JSTL specification. Currently 
the reference implementation correctly implements section 7.4 of the spec.

I went ahead and filed your bug against the JSTL specification. Once the issue 
is addressed by the specification, it can be fixed in the RI.
Comment 5 Pierre Delisle 2003-03-31 14:01:57 UTC
JSTL 1.1 has been amended to properly handle this bug.
Advisory character encoding now properly fetched from "charset" attribute
of "content-type" header.