Apache OpenOffice (AOO) Bugzilla – Issue 15799
MS-Word import fails, FlatXMLWriter produces invalid XML
Last modified: 2013-08-07 14:41:36 UTC
With the attached MS-Word document, it doesn't convert quite right. But even much worse, the resulting XML with FlatXMLWriter is invalid. The problem is that the original MS-Word document contains a vertical tab character (0xb). (On the "To:" line of this fax). In MS-Word, this puts the following text on the next line. OpenOffice can't handle this and displays a wierd character on the screen instead of putting it on the next line. Ok, this is annoying, but what is disasterous is that when using the FlatXMLWriter, the resulting XML is invalid because the control character is put directly into the output, and Java XML parsers barf on it. From my reading of the W3C XML spec (http://www.w3.org/TR/2000/REC-xml-20001006.html#NT-Char) vertical tab characters are illegal. This is a big problem for our project at the National Archive which plans to do long term storage with OpenOffice flat format because we have a lot of MS-Word documents that we can't convert. I think you need a dual resolution of this. Fix the MS-Word import, but more importantly, put more checking in your FlatXMLFilter to stop it producing invalid XML EVER!
Reassigned to MRU
MRU->CMC: if I understood correctly, by a "vertical tab" he means a "line break". Would it be possible to import the line break from the form field into Writer's input field? BTW: It is possible to have a line break in Writer's Input field.
>if I understood correctly, by a "vertical tab" he means a >"line break" No, I mean the ASCII character for "vertical tab" - "VT" which is 0xb hex 013 oct 11 dec. This is different to either carriage return or line feed. Although, in the original MS-Word file it seems to be represented visually by a new line.
VT Hex 0x0b dec 11 is used in word as a hard line break. in writer we use LF hex 0xa dev 12 for that purpose. I'll look into it. We do the conversion elsewhere in office, just not inside fields results.
Fixed, but a little risky for 1.1. Will make fix available in 2.0
reopen to reassign
cmc->mru: Working in limerickfilterteam08
Checked fix with internal CWS filterteam08.
Fix verified. Wil be included in OO .20.
*** Issue 17493 has been marked as a duplicate of this issue. ***
Hi, I found that this issue is Fixed, but target is OOo 2.0. Please consider to include this in OOo 1.1.1 if possible. Thankyou
No, as CMC pointed out, a bit too risky for OO 1.1.1. Will leave this as 2.0 fix.
Closed. Works with OO 2.0 snapshot build 680m28.