Bug 64931

Summary: Implement validation of changelog.xml file at build time
Product: Tomcat 10 Reporter: Konstantin Kolinko <knst.kolinko>
Component: DocumentationAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: 10.0.0-M10   
Target Milestone: ------   
Hardware: PC   
OS: All   

Description Konstantin Kolinko 2020-11-18 19:05:16 UTC
I have a fix for this that I will commit shortly. I am filing an issue to better document the problem and design decisions.


The file "webapps/docs/changelog.xml" sometimes has structural errors. Those errors are hard to spot. Thus it would be better to have an automated solution to catch and report them at build time.

For example, in Apache Tomcat 10.0.0-M10 the file has two such errors: at lines 182 and 1550.

https://github.com/apache/tomcat/blob/10.0.0-M10/webapps/docs/changelog.xml#L181


There are the following possibilities to implement the check:

(1) With XSLT, in the tomcat-docs.xsl stylesheet.

It is possible, but it would be an odd choice.

- Reporting an error can be done in XSLT 1.0 with

  <xsl:message terminate = "yes">...</xsl:message>

More recent versions of XSLT specification support validation against an XML Schema.

- Custom behaviour could be triggered by file name. The tomcat-docs.xsl stylesheet declared a `<xsl:param name="filename"` parameter.


(2) With an XML Schema.

I tried this way, but failed.

- Validation against an XML Schema is triggered with Apache Ant Task schemavalidate.

- Running a check against the changelog file with a simple schema fails shortly with an error:

  Element type "document" must be declared.

- My investigation (running with `ant -verbose` and searching through source code) found that this message is generated when performing a validation against a DTD.

(MSG_ELEMENT_NOT_DECLARED, org.apache.xerces.impl.dtd.XMLDTDValidator, in Apache Xerces 2.12.0)

- I tried running with `<schemavalidate disableDTD="true"`, but it does not help, as it fails at a `<!DOCTYPE document` declaration at the top of changelog.xml file.

- I did not found any other setting, any parser feature that could selectively turn off validation against a DTD.


(3) With a DTD.

I went with this way, and it worked successfully.

Validation against a DTD can be performed with Apache Ant Task xmlvalidate.

- Notes:

1. I defined the DTD inline in the changelog.xml file itself.

It could be moved to an external file, but there is no actual need as I am not going to validate other files.

2. Any XML element used in changelog.xml and project.xml files must be declared in the DTD. Any its attributes must be declared as well.

(The project.xml file is included into the changelog as an external entity.)

Thus far the only HTML markup elements that are actually used in Tomcat 10 changelog are <code> and <a>, but we may want to add others in the future.

A useful Tutorial on DTDs:
https://www.w3schools.com/xml/xml_dtd_intro.asp

A simple generic way to declare an element is

 <!ELEMENT elementname ANY>

A simple generic way to declare an attribute of an element is

 <!ATTLIST elementname attributename CDATA #IMPLIED>