Bug 61316

Summary: Test files used by TestEncodingDetector test are broken in src.zip, src.tar.gz [8.5.18]
Product: Tomcat 8 Reporter: Konstantin Kolinko <knst.kolinko>
Component: PackagingAssignee: Tomcat Developers Mailing List <dev>
Severity: normal    
Priority: P2    
Version: 8.5.x-trunk   
Target Milestone: ----   
Hardware: PC   
OS: All   

Description Konstantin Kolinko 2017-07-19 14:19:59 UTC
Testing 8.5.18 (Release candidate) from apache-tomcat-8.5.18-src.zip
the following test is consistently failing:


Testing on Windows 10, Java 8/Java 7 x all connectors - all are failing.

Comparing unpacked -src.zip with fresh checkout of the tag,
the files used by this test differ.

>svn st
M       test\webapp\jsp\encoding\bom-none-prolog-utf16be.jspx
M       test\webapp\jsp\encoding\bom-none-prolog-utf16le.jspx
M       test\webapp\jsp\encoding\bom-utf16be-prolog-none.jsp
M       test\webapp\jsp\encoding\bom-utf16be-prolog-none.jspx
M       test\webapp\jsp\encoding\bom-utf16be-prolog-utf16be.jspx
M       test\webapp\jsp\encoding\bom-utf16be-prolog-utf16le.jspx
M       test\webapp\jsp\encoding\bom-utf16be-prolog-utf8.jspx
M       test\webapp\jsp\encoding\bom-utf16le-prolog-none.jsp
M       test\webapp\jsp\encoding\bom-utf16le-prolog-none.jspx
M       test\webapp\jsp\encoding\bom-utf16le-prolog-utf16be.jspx
M       test\webapp\jsp\encoding\bom-utf16le-prolog-utf16le.jspx
M       test\webapp\jsp\encoding\bom-utf16le-prolog-utf8.jspx

The list of diffing files is the same for -src.zip and -src.tar.gz.

Looking into the files from src.zip with a hex editor
(e.g. bom-none-prolog-utf16be.jspx), I see sequences like
000D 0A00 0D0A

These '0D0A' sequences apparently originate from LF -> CRLF conversion that treated these 16-bit files as 8-bit ones.

Looking into the files from src.tar.gz, I see
000A 000A

All line wraps are doubled - there are additional empty lines everywhere.

The correct file in svn repository has
000D 000A
Comment 1 Christopher Schultz 2017-07-19 20:35:13 UTC
Could this be a svn client problem?

$ svn pget svn:mime-type test/webapp/jsp/encoding/bom-none-prolog-none.jspx
text/plain; charset=UTF-8
$ svn pget svn:eol-style test/webapp/jsp/encoding/bom-none-prolog-none.jspx

$ svn pget svn:mime-type test/webapp/jsp/encoding/bom-utf16be-prolog-utf16le.jspx
text/plain; charset=UTF-16BE
$ svn pget svn:eol-style test/webapp/jsp/encoding/bom-utf16be-prolog-utf16le.jspx
svn: warning: W200017: Property 'svn:eol-style' not found on 'test/webapp/jsp/encoding/bom-utf16be-prolog-utf16le.jspx'
svn: E200000: A problem occurred; see other errors for details

Should the eol-style be set for any of these files? Should the mime-type be application/binary, or do we trust svn clients not to botch the encoding?
Comment 2 Konstantin Kolinko 2017-07-20 13:13:29 UTC
svn is configured correctly:
If I run the tests from svn checkout, it completes successfully on my Windows, and it completes successfully at Buildbot (on Linux).

From svn pont of view the files with 16-bit characters should be treated as binary. The should not have svn:eol-style property on them.

The problem is with configuration of <fixcrlf> task in build.xml.
Comment 3 Mark Thomas 2017-07-24 11:55:11 UTC
Fixed in:
- trunk for 9.0.0.M25 onwards
- 8.5.x for 8.5.19 onwards