Apache OpenOffice (AOO) Bugzilla – Issue 21116
Numbering corrupted after export to HTML
Last modified: 2013-08-07 14:38:26 UTC
If I open an DOC or RTF document and save it as HTML, numbering is incorrect. But original document is always displayed with correct numbering, so it probably is more export than import filter error. I have set HTML 3.2 to be default HTML export type, and I blocked font settings to avoid gray fields to be exported. I'll post demonstration files.
Created attachment 10259 [details] Original document (compressed bz2)
Created attachment 10261 [details] Exported to HTML 3.2
reassigned to es
I can Reproduce the problem on OpenOffice 1.1 (default Install, US), Win XP Pro Sp1. (And MS Office XP Sp2). It is real problem when exporting to html at page 2. the numbering is all wrong. It is export filter problem change OS to all, reproduce able in win xp
.
The sample document has from the begining a wrong number formatting. It's a simple numbering while the structure of the document would need an outline numbering (a defined hierachy of numbering levels). Indeed, on can notice that the "levels" 1, 2, 3... and a). b), c)... are not levels of a *same* hierachy but both "level 1" of different hierachies. Saving to HTML, OOo applies then the default format of level 1 to each corresponding level. So the effect: 1 2 1 2 3 is pretty normal
ES->MIB It remains that a correct hierachy like: 1. a. i. Will be save as: 1. 1. i. to HTML 3.2, when the source document is a *doc or *.rtf (see attachment below). Which is wrong. HTML 3.2 knows the OL TYPE attribute. http://www.w3.org/TR/REC-html32#ol
Created attachment 10906 [details] Word document with a numbering hierarchy
The numbering a, b, c, aa, bb, cc, ... is not mapped to a, b, c though it could.
Created attachment 11609 [details] Another short example for problem study
Created attachment 11610 [details] More complex example, bz2-ed doc and html
On OOo 1.1.2, bug still present.
Created attachment 33737 [details] SXW document -another test case.
Created attachment 33738 [details] HTML export -numbering dosen't follow original document from paragraph 2
More on this: -- affects also current versions of OOo -- there is a similar problem with RTF saves -- yet, the HTML issue seems to lie in the save-export side, where the RTF is likely in the open-import side ...so probably they should be reported separately, but I am not the expert and this *old* issue deserves some attention... >>> swriter, ODT source: 1. save as HTML (both 3.2 and 4.1 transitional), close, open HTML: 1.1. OOo 2.4.1: numbering misses the Before and After text (e.g. "Section 1 - " gives "1. "). Same if opening in web browser. 1.2. OOo 3 beta (build 9328): numbering misses the Before and After text (e.g. "Section 1 - " gives "1. "). Same if opening in web browser. 2. save as RTF, close, open RTF: 2.1. OOo 2.4.1: strange text ('Left Page;Right Page;Envelope;Endnote', etc.) appears at the beginning of the document, and numbering misses spaces at the end of Before text (e.g. "Section 1" reads "Section1"). None of this applies if opening in MS WordPad. 2.2. OOo 3 beta (build 9328): strange text ('Left Page;Right Page;Envelope;Endnote', etc.) appears at the beginning of the document, and numbering is completely lost (a blank paragraph appears if no other text followed). None of this applies if opening in MS WordPad.
HTML does not support prefix and suffice for numberings, so
If numbering format cannot be kept when saving from ODT to HTML, sometimes one would prefer keeping the numbering structure, but sometimes one would prefer keeping the appearance (as when saving to TXT: in that case, being not possible to retain the numbering internals, the choice is clear). I am afraid there is no one 'best' choice --if you think there is, it's because you haven't been in the other side. So perhaps this raises the matter of the warning about format loss when exporting to other document formats. Instead of a generic warning, couldn't it be more specific and detailed? Perhaps with links to help pages with still more info, possibly tricks to bypass some limitations, etc.