Summary: | Problem with Jsp character encoding configuration | ||
---|---|---|---|
Product: | Tomcat 8 | Reporter: | Lazar Kirchev <lazar.kirchev> |
Component: | Jasper | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 8.5.11 | ||
Target Milestone: | ---- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: |
The two sample applications reproducing the problems
A sample reproducing the problem with exotic encoding Sample war with jspx in exotic encoding Correct war for reproducing the exotic encoding problem |
Description
Lazar Kirchev
2017-02-23 08:16:44 UTC
Yes, there was a regression in the refactoring. The detected BOM encoding was incorrectly taking precedence over the prolog specified encoding (if any). Thanks for the report and the test case. Created attachment 34908 [details]
A sample reproducing the problem with exotic encoding
Hello Marc, I noticed that the second scenario is still failing if the encoding is more exotic - I tried with IBM871 - IBM EBCDIC (Icelandic). I debugged a little and noticed that EncodingDetector.getPrologEncoding() returns null although there is an encoding attribute specified in the prolog. Then the if on lines 67 - 73 in EncodingDetector goes in the second branch as if there is no encoding specified in the prolog. I attach sample2.war, with which I reproduced it. It is essentially the same as sample1.war, only the encoding in enctest.jspx is IBM871. Probably this is an issue with the XMLStreamReader? I've done some further testing and fixed an unrelated bug but as for as unusual encodings go, they have to be specified in the prolog else the JRE's XML parser doesn't have enough information to be able to reliably determine the encoding. The content of the enctest.jspx is: <?xml version="1.0" encoding="IBM871"?> <html xmlns:jsp="http://java.sun.com/JSP/Page"> <jsp:directive.page pageEncoding="IBM871" /> <jsp:output omit-xml-declaration="no"/> <body> You should see this text. </body> </html> So actually there is an encoding attribute in the prolog. For some reason JRE XML parser does not detect it correctly. On the other hand, the deprecated XMLEncodingDetector from before the refactoring, which parsed the files itself, correctly detects the encoding from the prolog - for example, with Tomcat 8.5.4 the sample works correctly. I apologise for that my second attachment is an incorrect one - I noticed that by mistake I have attached the second war from the first attachment instead of the problematic war with IBM871 encoding. I attach now the correct one with name encsample.war Created attachment 34913 [details]
Sample war with jspx in exotic encoding
Comment on attachment 34913 [details]
Sample war with jspx in exotic encoding
Invalid jspx file within.
Created attachment 34914 [details]
Correct war for reproducing the exotic encoding problem
Thew "unrelated bug" I fixed appears to have fixed the issue you were seeing. The fix is r1791298. If you can test with 9.0.x trunk or 8.5.x trunk to confirm that would be great. Thanks Mark! I tried the fix from 8.5 trunk and it works. Something I noticed while debugging, probably it is not a problem, but I prefer to mention it: In EncodingDetector's constructor, on line 61 (https://github.com/apache/tomcat85/blob/c29a2b45f57e481380d88a8fa0c6f4f0f242aca1/java/org/apache/jasper/compiler/EncodingDetector.java#L61) The buffered input stream is being reset, but on the next lines the number of bytes which should be skipped are read from the initial input stream and not from the buffered input stream. Is this intended? Because when the buffered input stream is reset, the underlying input stream is not reset and its position stays where it was - e.g., at 4. And then when the bytes which should be skipped are read from it its position goes to e.g. 8. Is this intended? Good catch. That would be a bug. I'll get it fixed. |