Bug 66392

Summary: AccessLogValve's file encoding does not correspond to the documentation
Product: Tomcat 8 Reporter: Michael Osipov <michaelo>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: michaelo
Priority: P2    
Version: 8.5.84   
Target Milestone: ----   
Hardware: All   
OS: All   

Description Michael Osipov 2022-12-15 21:37:59 UTC
This [1] resource says:
encoding: Character set used to write the log file. An empty string means to use the system default character set. Default value: use the system default character set.
But the code [2] does ISO-8859-1:
>         if (encoding != null) {
>             try {
>                 charset = B2CConverter.getCharset(encoding);
>             } catch (UnsupportedEncodingException ex) {
>                 log.error(sm.getString(
>                         "accessLogValve.unsupportedEncoding", encoding), ex);
>             }
>         }
>         if (charset == null) {
>             charset = StandardCharsets.ISO_8859_1;
>         }


May java.nio.charset.Charset.defaultCharset() would be better here, no?

Note: I haven't verified with other Tomcat versions, but I guess they have the same problem.

[1] https://tomcat.apache.org/tomcat-8.5-doc/config/valve.html#Access_Log_Valve/Attributes
[2] https://github.com/apache/tomcat/blob/cf2015c1350a3f057182dd4c26c20f68df8b3400/java/org/apache/catalina/valves/AccessLogValve.java#L638-L648
Comment 1 Han Li 2022-12-16 02:58:01 UTC
(In reply to Michael Osipov from comment #0)
> May java.nio.charset.Charset.defaultCharset() would be better here, no?
No, I found the reason why not use java.nio.charset.Charset.defaultCharset() by git commit log. 
https://github.com/apache/tomcat/commit/972212836bf278e443b87418d961d6ddb04262a6

I also have a question based on the comment at BZ51408, is it time to use UTF-8 as the default encoding set now?
Comment 2 Michael Osipov 2022-12-16 08:01:19 UTC
(In reply to Han Li from comment #1)
> (In reply to Michael Osipov from comment #0)
> > May java.nio.charset.Charset.defaultCharset() would be better here, no?
> No, I found the reason why not use java.nio.charset.Charset.defaultCharset()
> by git commit log. 
> https://github.com/apache/tomcat/commit/
> 972212836bf278e443b87418d961d6ddb04262a6
> 
> I also have a question based on the comment at BZ51408, is it time to use
> UTF-8 as the default encoding set now?

Looking that the commit, in several spots US-ASCII would be a better choice because the usecase does not allow anything else but 7-bit chars. In this case, yes I'd prefer UTF-8 all the way -- at least the documentation and code must be consistent.

E.g., here https://github.com/apache/tomcat/commit/972212836bf278e443b87418d961d6ddb04262a6#diff-95ff2368b8571e4f1439f6a6c954993bd9a1de99e7083b9f0c07bb39af7382a2R100. US-ASCII is fully sufficient.

Similar case: https://github.com/apache/httpcomponents-core/pull/375
Comment 3 Han Li 2022-12-20 05:36:57 UTC
Fixed in:
- 11.0.x for 11.0.0-M2 onwards
- 10.1.x for 10.1.5 onwards
- 9.0.x for 9.0.71 onwards
- 8.5.x for 8.5.85 onwards