Bug 60467

Summary: Build fails if Saxon HE is on Ant's classpath
Product: Tomcat 8 Reporter: Michael Osipov <michaelo>
Component: MetaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 8.5.x-trunk   
Target Milestone: ----   
Hardware: All   
OS: All   
Attachments: Replaced invalid chars

Description Michael Osipov 2016-12-12 12:35:48 UTC
Created attachment 34519 [details]
Replaced invalid chars

This is basically the same issue as reported by years ago: bug 51028
Illegal characters make Saxon 9.7 choke:

     [xslt] Processing D:\Projekte\tc8.5.x\webapps\docs\config\jar-scan-filter.xml to D:\Projekte\tc
8.5.x\output\build\webapps\docs\config\jar-scan-filter.html
     [xslt] D:\Projekte\tc8.5.x\webapps\docs\tomcat-docs.xsl:510:2: Fatal Error! Illegal HTML charac
ter: decimal 151
     [xslt] Failed to process null

BUILD FAILED
D:\Projekte\tc8.5.x\build.xml:939: Fatal error during transformation using D:\Projekte\tc8.5.x\webap
ps\docs\tomcat-docs.xsl: Illegal HTML character: decimal 151; SystemID: file:/D:/Projekte/tc8.5.x/we
bapps/docs/tomcat-docs.xsl; Line#: 510; Column#: 2



Find the fix attached.
Comment 1 Christopher Schultz 2016-12-12 16:02:35 UTC
Thanks for the find and the patch.

I'll note that the XML specification *does* allow that particular character, though I have no idea what it was supposed to be. You have replaced it with U+2003 which is the "em-width space". I don't know why it's in there.. Konstantin's editor probably automatically-encoded something he typed on his keyboard without him knowing.

I think it would be better as either a regular space or as a hyphen (-), so I'll use a hyphen.
Comment 2 Christopher Schultz 2016-12-12 16:31:53 UTC
Fixed. Will be in Tomcat 9.0.0.M16, 8.5.10, and 8.0.40.
Comment 3 Michael Osipov 2016-12-12 17:00:09 UTC
(In reply to Christopher Schultz from comment #1)
> Thanks for the find and the patch.
> 
> I'll note that the XML specification *does* allow that particular character,
> though I have no idea what it was supposed to be. You have replaced it with
> U+2003 which is the "em-width space". I don't know why it's in there..
> Konstantin's editor probably automatically-encoded something he typed on his
> keyboard without him knowing.
> 
> I think it would be better as either a regular space or as a hyphen (-), so
> I'll use a hyphen.

No, read my explanation: https://bz.apache.org/bugzilla/show_bug.cgi?id=51028#c7, is it he same case as back them. Wrong encoding used for a numeric reference. Hence, making it invalid as a Unicode point.