This is a follow-up to bug#39019 (removing read-only restriction in Word 2003). The goal is to output a .wordml file using a server-side xhtml2wordml transformation, edit with Word 2003, save, and transform back using a server-side wordml2xhtml transformation.
Created attachment 18075 [details] add ability to edit wordml files with webdav This patch creates a second output file for each document with a .wordml extension, so the tutorial page shows as both tutorial_en.html and tutorial_en.wordml in the webdav folder. I used wordml as the extension to make it explicit but it could be changed to xml or even doc. The default webdav GET matcher was also changed from davget.xml to davget.html to match its output format and simplify the pipelines. Two stylesheets were added for transforming to and from wordml on the server: xhtml2wordml.xsl and wordml2xhtml.xsl. Originally these transformations were being done within Word, but Word has a bug when fetching stylesheets from a webdav or http address on saving so we switched to doing the transformations server-side. A benefit is that the Professional version of Word 2003 is no longer needed because the advanced xml features aren't needed when the file is wordml. There is also a free plugin available that lets you use the wordml format with older versions of Word, but I have not tested it yet. The stylesheets only implement a basic subset of xhtml at the moment (p, h1-h6, em, strong, a). They also restrict what styles are available while editing in Word, which is useful for enforcing markup guidelines. The wordml2xhtml.xsl also has support for tables and lists, a big thanks to Josias for adding this.
Created attachment 18131 [details] added change to odt module to previous patch
Could someone please review + apply this? Or should we schedule it for 1.4.1?
i don't use word. the patch looks very clean, but i don't like how everything gets stuffed into the xhtml module. using the format mechanism for wordml certainly makes sense, but i'd be more comfortable if we had a separate xhtml2wordml module... but i wonder: how much demand is there for such a feature? the word-to-web stuff i've seen in the past has been the antithesis of standards-compliance and semantic structure - does it really make sense to encourage people to publish from word?
Considering that this is over a year old now, I'm sure the patch would need to be resubmitted with corrections to the sitemaps. We could also create a word module similar to the opendocument module to hold the stylesheets. And Joern, if you are curious about the semantic structure you may want to read the description above and have a look at the stylesheets. Most Word-to-web stuff doesn't use WordML directly but rather has other transformations performed before saving, which is where a lot of the non-compliance comes in. These stylesheets work directly on the wordml and keep only the elements you want. Right now there is only a minimal set of xhtml elements supported so it is standards compliant. I would save this for 1.4.1.