66392 – AccessLogValve's file encoding does not correspond to the documentation

Bug 66392 - AccessLogValve's file encoding does not correspond to the documentation

Summary: AccessLogValve's file encoding does not correspond to the documentation

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 8
Classification:	Unclassified
Component:	Catalina (show other bugs)
Version:	8.5.84
Hardware:	All All

Importance:	P2 normal (vote)
Target Milestone:	----
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-12-15 21:37 UTC by Michael Osipov
Modified:	2023-01-24 16:43 UTC (History)
CC List:	1 user (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Osipov 2022-12-15 21:37:59 UTC

This [1] resource says:
encoding: Character set used to write the log file. An empty string means to use the system default character set. Default value: use the system default character set.
But the code [2] does ISO-8859-1:
>         if (encoding != null) {
>             try {
>                 charset = B2CConverter.getCharset(encoding);
>             } catch (UnsupportedEncodingException ex) {
>                 log.error(sm.getString(
>                         "accessLogValve.unsupportedEncoding", encoding), ex);
>             }
>         }
>         if (charset == null) {
>             charset = StandardCharsets.ISO_8859_1;
>         }


May java.nio.charset.Charset.defaultCharset() would be better here, no?

Note: I haven't verified with other Tomcat versions, but I guess they have the same problem.

[1] https://tomcat.apache.org/tomcat-8.5-doc/config/valve.html#Access_Log_Valve/Attributes
[2] https://github.com/apache/tomcat/blob/cf2015c1350a3f057182dd4c26c20f68df8b3400/java/org/apache/catalina/valves/AccessLogValve.java#L638-L648

Comment 1 Han Li 2022-12-16 02:58:01 UTC

(In reply to Michael Osipov from comment #0)
> May java.nio.charset.Charset.defaultCharset() would be better here, no?
No, I found the reason why not use java.nio.charset.Charset.defaultCharset() by git commit log. 
https://github.com/apache/tomcat/commit/972212836bf278e443b87418d961d6ddb04262a6

I also have a question based on the comment at BZ51408, is it time to use UTF-8 as the default encoding set now?

Comment 2 Michael Osipov 2022-12-16 08:01:19 UTC

(In reply to Han Li from comment #1)
> (In reply to Michael Osipov from comment #0)
> > May java.nio.charset.Charset.defaultCharset() would be better here, no?
> No, I found the reason why not use java.nio.charset.Charset.defaultCharset()
> by git commit log. 
> https://github.com/apache/tomcat/commit/
> 972212836bf278e443b87418d961d6ddb04262a6
> 
> I also have a question based on the comment at BZ51408, is it time to use
> UTF-8 as the default encoding set now?

Looking that the commit, in several spots US-ASCII would be a better choice because the usecase does not allow anything else but 7-bit chars. In this case, yes I'd prefer UTF-8 all the way -- at least the documentation and code must be consistent.

E.g., here https://github.com/apache/tomcat/commit/972212836bf278e443b87418d961d6ddb04262a6#diff-95ff2368b8571e4f1439f6a6c954993bd9a1de99e7083b9f0c07bb39af7382a2R100. US-ASCII is fully sufficient.

Similar case: https://github.com/apache/httpcomponents-core/pull/375

Comment 3 Han Li 2022-12-20 05:36:57 UTC

Fixed in:
- 11.0.x for 11.0.0-M2 onwards
- 10.1.x for 10.1.5 onwards
- 9.0.x for 9.0.71 onwards
- 8.5.x for 8.5.85 onwards