Issue 18295 - XSLT filter does not generat valid XHTML
Summary: XSLT filter does not generat valid XHTML
Status: CLOSED FIXED
Alias: None
Product: xml
Classification: Code
Component: external filters (show other issues)
Version: OOo 1.1 RC2
Hardware: PC All
: P2 Trivial (vote)
Target Milestone: OOo 2.0
Assignee: jogi
QA Contact: issues@xml
URL:
Keywords: needhelp
Depends on:
Blocks:
 
Reported: 2003-08-16 23:54 UTC by Unknown
Modified: 2004-07-19 08:38 UTC (History)
5 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
xslt environment (updated stylesheets, testfiles, processor (XT with patch)) use it with JDK/JRE 1.4 (Xerces/Xalan) (3.13 MB, application/x-gzip)
2004-01-09 21:42 UTC, svante.schubert
no flags Details
Latest stylesheets (renamed oo2xhtml.xsl to main_html.xsl for OOo usage) (39.08 KB, application/x-gzip)
2004-01-19 19:08 UTC, svante.schubert
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Unknown 2003-08-16 23:54:05 UTC
I've been playing with the XHTML export feature and discovered a few annoying
issues:

1. The output doesn't contain the DOCTYPE. Setting the DOCTYPE through the
Transformation tab in XML Filter Settings does nothing. I've tweaked
main_html.xsl to specify the DOCTYPE, but the generated string is not ok.
Instead of 
<!DOCTYPE HTML PUBLIC...
it should say
<!DOCTYPE html PUBLIC...

2. The output contains loads of xml namespace prefixes. I tweaked it by changing
exclude-result-prefixes="java" to exclude-result-prefixes="the_whole_list_of
prefixes". Don't know if it's good or bad, but it worked for me.

3. The META element is not closed, therefore resulting in numerous error reports
by the validator.

Haven't seen 1.1RC3 yet, possibly some of those issues are resolved in this release.
Comment 1 utomo99 2003-09-18 08:33:18 UTC
I wish we can follow the standard 
http://www.w3.org/XML/

we need to recheck everything beside this on the suggestion
Comment 2 jogi 2003-10-01 08:38:05 UTC
At first: Sorry for being late but the tooling set the first owner
wrong . It has been corrected today.

Enhancement:
It's correct that we are not valid for now. That's what Svante told me
and that it is the future to be valid.
Target will be only Office later - another only if Svante has it
already on his 'roadmap' for OOo 2.0?!
Comment 3 jogi 2003-10-01 08:38:52 UTC
Set correct sub component.
Comment 4 svante.schubert 2003-10-13 11:00:01 UTC
I am aware of the unvalidness of the XHTML and it is a certainly my
personal goal to make it valid.

To your concerns:
1. I didn't bring in the doctype, as it is not valid, so nobody should
get the idea to write bugs about it yet.

2. Seems to be a parser / processor problem, sorry I can not reproduce

3. A unclosed meta tag is not even well-formed XML, this is a XSLT
processor problem for sure.


Believe me, I am anoyed with this missing feature as you are, and I
gave dozen of hours of private working time into the stylesheets, but
I don't think that the marketing hang up the XSLT filter high in
priority for the next time. Despite of that, I think that it will
exchange the current HTML filter some day.

Currently, I have little time to spend on the filter, but I nearly got
a new version finished (unpublished), and if you like to work with me
on it, I would be very happy if we could make this filter to our
filter and raise it up to product quality.


Problems of XHTML (transient) validness I figured out are:

- Nested paragraphs in case draw:text-box include a paragraph
- Anchor names have to be transformed accordingly to XHTML specs



Comment 5 svante.schubert 2003-11-21 11:08:12 UTC
Thanks for offering your help, Leonid.
I make the updated files soon accessible as download.
Comment 6 svante.schubert 2003-12-12 15:52:12 UTC
With my filterxslt01 CWS (childworkspace), I gonna fix at least some of theses
problems, we should specify detailed problems and write follow up bugs.
--> changed target to OOo2.0
Comment 7 svante.schubert 2004-01-09 20:21:06 UTC
- Added title and base element to header
- Made several changes to changes to tables and paragraph nestling

The remaining problems will be splittet into several add-on bugs explaining the
problem.

I suggest to start with transitional and work afterwards further on strict.
Simply add
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
respectivly
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
to the xsl:output element of the stylesheet.

I added a tough test document (the OpenOffice Specification).
Validness might be tested by multiple tools (e.g. W3C Validator).


KNOWN TRANSIENT VALIDNESS PROBLEM:
Nesting of paragraph in a span element.

KNOWN STRICT VALIDNESS PROBLEM:
Table Alignment:
To align a table horizontal, I created a surrounding div tag. Obviously that
wasn't a valid solution in strict XHTML.
Comment 8 svante.schubert 2004-01-09 21:19:30 UTC
Added my officeless xslt environment, first >4MB environment with XALAN &
XERCES, but don't download it, it is buggy with one document. 
Simply install a JDK/JRE1.4, so you would use an earlier XALAN & XERCES version,
without the issue.
The environement is without the stylesheets, which are added separately.
Comment 9 svante.schubert 2004-01-09 21:21:20 UTC
Hopefully someone is able to help me out with this, I am not able to plan time
for this issue and want to close CWS (ChildWorkSpace) end of next week (if
someone needs more time we can take expand it a week longer). If noone helps, I
have to move this issue out of the CWS.
Comment 10 svante.schubert 2004-01-09 21:42:25 UTC
Created attachment 12395 [details]
xslt environment (updated stylesheets, testfiles, processor (XT with patch)) use it with JDK/JRE 1.4 (Xerces/Xalan)
Comment 11 svante.schubert 2004-01-19 19:05:02 UTC
Tested the stylesheets in the Office and realized, that the office filter does
not provide any parameter at all for the stylesheets.

Wrote #i24398 (i.e. http://www.openoffice.org/issues/show_bug.cgi?id=24398), so
relative links (when document stored to a different directory as the origion and
links to zipped graphics (JAR URLs) will remain.

Unfortunatly the XHTML filter is compatible to zipped OOo/unzipped OOo and flat
OOo by parameters. The URL of the meta and styles xml file given as parameters.

Due to this, I rewrote the internally usage of global variable to a global data
parameter (cmp. stylesheets attached). 

By this new enhancement, the new filter works with the StarOffice 7 as well
(better than ever).

For compatibility reasons renamed the starting stylesheets 'ooo2xhtml.xsl' back
to 'main_html.xsl'. To use these new stylesheets in your office simply unzip the
 stylesheets (xhtml and common folder) into your <OFFICE>/share/xslt directory
(e.g. C:\Program Files\StarOffice7\share\xslt).
Comment 12 svante.schubert 2004-01-19 19:05:34 UTC
Tested the stylesheets in the Office and realized, that the office filter does
not provide any parameter at all for the stylesheets.

Wrote #i24398 (i.e. http://www.openoffice.org/issues/show_bug.cgi?id=24398), so
relative links (when document stored to a different directory as the origion and
links to zipped graphics (JAR URLs) will remain.

Unfortunatly the XHTML filter is compatible to zipped OOo/unzipped OOo and flat
OOo by parameters. The URL of the meta and styles xml file given as parameters.

Due to this, I rewrote the internally usage of global variable to a global data
parameter (cmp. stylesheets attached). 

By this new enhancement, the new filter works with the StarOffice 7 as well
(better than ever).

For compatibility reasons renamed the starting stylesheets 'ooo2xhtml.xsl' back
to 'main_html.xsl'. To use these new stylesheets in your office simply unzip the
 stylesheets (xhtml and common folder) into your <OFFICE>/share/xslt directory
(e.g. C:\Program Files\StarOffice7\share\xslt).
Comment 13 svante.schubert 2004-01-19 19:08:18 UTC
Created attachment 12591 [details]
Latest stylesheets (renamed oo2xhtml.xsl to main_html.xsl for OOo usage)
Comment 14 svante.schubert 2004-01-19 19:16:14 UTC
To use in directly in StarOffice / OpenOffice not in stand-alone test
environment, please rename the 'ooo2xhtml.xsl' stylesheet to 'main_html.xsl'.

Comment 15 svante.schubert 2004-01-27 13:30:42 UTC
Added the default XHTML strict public DTD to figure out if there are further
constraints. 
If further constraints are being noticed, please add a new enhancement task with
a detailed description of the specific problem.

Comment 16 svante.schubert 2004-04-06 10:53:17 UTC
SUS->JSI: For testing transform any arbitrary document and validate the XML
output with a 3rd party tool (e.g. XML Spy) and/or you may use the TEST
environment in the XML Filter Settings and validate with the Office.
Comment 17 svante.schubert 2004-04-20 14:07:04 UTC
Reassign to QA
Comment 18 svante.schubert 2004-04-22 14:03:40 UTC
SUS: Changed issue type to FEATURE as discussed with JSI, feature mail will follow
Comment 19 jogi 2004-04-23 09:44:16 UTC
Now we're exporting - first time ever (!) - valid XHTML files! If there is a big
in XSLT transformation (and we have some issues open) the validation may fail
but a big percent of documents have been exported correctly.
Comment 20 jogi 2004-04-23 09:48:17 UTC
Verified. Also if we do not support some issues (like graphics at the moment)
the export also in draw / impress is valid!
Comment 21 jogi 2004-04-23 09:49:19 UTC
ok.
Comment 22 jogi 2004-07-19 08:38:24 UTC
Seen okay in SRC680m48