Apache OpenOffice (AOO) Bugzilla – Issue 12317
Generated HTML changes default font and spacing
Last modified: 2013-08-07 14:43:39 UTC
(generated from file attached to 12294) The generated HTML file changes the default font from Arial to Times New Roman, and increases the line spacing (especially noticable in the Roll Call section at the top).
The roll call is also left justified, instead of being indented.
Please add a sample file (if it's an HTML file, zip it because IssueZilla destroys them). What do you call "the rall call"???
Changing the export type from Netscape 4.x to HTML 3.2 fixed the font problem (it is now Arial) but the spacing problem remains, and the Page/Item text is now formatted even worse than it was when saving with Netscape 4.x compatibility (which didn't match the document either). The "roll call" is section 1.A of the document referred to at the beginning of comment one. I was hoping that you could generate whatever you needed from the attachment to issue 12294, but I will include a zip file with the Word document, and all five versions of HTML that OOwriter can output. There are quite a few problems with how the HTML appears in any given browser, let me know how far I should break the problems down. I don't want to write a bunch of issues and have them considered duplicates.
Created attachment 5142 [details] Word file and five HTML files generated from same
The IE4, NS4, and OO HTML versions use style sheets which get the font wrong (Times instead of Arial). All versions space the document correctly, see section 1.A. of the original document. All versions throw extra text at the bottom, which seems to be a random number. The 3.2 HTML and IE3 versions turn the number in to a GIF and embed that at the bottom of the page. All versions poorly indent the page-format column in section 2.A. of the original document. Tables are a overly indented, especially item 7E-1 in the original document. This may be due to the illegal nesting of <DL> tags mentioned in another of my bug reports.
> All versions space the document correctly, see section 1.A. of the > original document. Tabs (as they exist in the Word document) can't be exporten to HTML (for they don't exist in HTML). > All versions throw extra text at the bottom, which seems to be a > random number. I didn't see any extra text at the bottom of the page. > The 3.2 HTML and IE3 versions turn the number in to a > GIF and embed that at the bottom of the page. Because those versions of HTML didn't support text frames. > All versions poorly indent the page-format column in section 2.A. of > the original document. Please write an other issue concerning "negative text indent". In builds earlier than 6xx negative text indents where correctly displayed. Now they don't get displayed. That's the problem you have in this section. > Tables are a overly indented, especially item 7E-1 in the original > document. I don't see any table on 7E-1.
>> All versions space the document incorrectly, see section 1.A. of the >> original document. >Tabs (as they exist in the Word document) can't be exporten to HTML >(for they don't exist in HTML). I should have said line spacing. In section 1.A. everything is double spaced, and in the original document when viewed with OOo or Word the lines are single spaced. >> All versions throw extra text at the bottom, which seems to be a >> random number. >I didn't see any extra text at the bottom of the page. HTML 3.2 and IE3 versions have an image element: img SRC="04022009RG_html_m777f5499.gif" and the GIF is the number 1. The word "Frame 1" also appears at the bottom the the rendered HTML. The oowriter and NS4 versions have the number 49 at the bottom of the page, and the IE4 version has the number 47. This is caused by the number between the start and end SDFIELD tags. >> Tables are a overly indented, especially item 7E-1 in the original >> document. >I don't see any table on 7E-1. Under section 2.A. (page 4) there is an addendum item for page 43 section 7E-1 that is a Word table. The column names are 'Nominee', 'Representing', 'Seat No.' and 'Nominated by'. These columns are all the way to the left. The previous Word table (addendum item for page 43 section 7D-1) is pushed all the way to the right. Is this also the negative text indent problem?
Created attachment 5225 [details] MS Office conversion to HTML
I have attached what MS Office generates when converting this document to HTML for comparison purposes. While this file renders beautifully in a browser matching the original document exactly, the code is a mess, doesn't validate, and has an 8K header from hell that is impossible to reduce. The goal of this defect (and other defects that may be spawned) is to have this same quality of rendered output, with valid, simple HTML.
I also want to point out the original comment, that the HTML version changes the font of the original document from Arial to Times Roman, for the versions that use styles (IE4, NS4, and oowriter).
Negative text indent issue created as #12758.
Eric, please finish this one.
This issue cannot continue any further like this because from 1 problem (summary), we have now 3-4 different problems. 1 problem = 1 issue, please. But answering your comments: - Font problems: looks fine on an internal 644m13 (Arial remains Arial) -> please to report issue, use the current milestones. - Spacing: cannot be "fixed". Writer uses the "Default" style as... default doument style. But this style does not exist in HTML and there is a fallback to the "Txt body" style (which has a double line spacing) when exporting to HTML. - The image or frame with a number: on one hand, it does not make sens in HTML to have page numbers (there's only 1 page) but of course, the export filters doesn't "know" about that. They just consider a frame and a field and export it the best they can (following the Netscpae, IE etc... standards) - The table of 7E-1: this table is not right aligned in the original document, it is aligned left but its size reaches the right margin. The table is still aligned to the left, indented from 0.50cm from the left and has the same size after exporting to HTML. But the text area is bigger in HTML because you have no margins and the table looks "more" aligned to the left than it was. Conclusion: if you intend to export a document from a word processor to HTML, you should use a layout compatible with HTML from the beginning. Avoid tabs (use tables instead), use styles instead of hard formattings and so on.,,
closed
Please don't close defects as invalid when the problems still exist. Per my previous comment "There are quite a few problems with how the HTML appears in any given browser, let me know how far I should break the problems down. I don't want to write a bunch of issues and have them considered duplicates." Use this defect for the original reported problem, the OOo changes the default font and spacing, and this has not been fixed. >looks fine on an internal 644m13 Which output version? OOo can generate six different types of HTML. Does it work for all six versions? Is this fixed on a released build? >Spacing: cannot be "fixed". MS Word seems to handle it fine. I fixed it manually by removing <DD><P ALIGN=JUSTIFY> and replacing it with <BR>. This allowed me to remove the extra <FONT> tags, since all the names were now one inline element. >...it does not make sense in HTML to have page numbers...They just consider a frame and a field and export it the best they can... This is not the issue. The number added to the bottom of the HTML was never in the original document, and isn't a page number in any case. It's and undocumented tag used by OOo. >The table of 7E-1: this table is not right aligned... The problem is that the HTML has it aligned to the right: <DIV ALIGN=RIGHT> instead of to the left as in the original document. >Conclusion: if you intend to export a document from a word processor to HTML, you should use a layout compatible with HTML from the beginning. A nice thought, but there are quite a few documents written before HTML conversion became possible, or even an idea. Given that there are other conversion programs that handle these stumbling blocks, it would seem the better solution would be to fix the program, rather then manually reformating thousands of large documents. I will be writing defects for each item mentioned. I will also indicate the offending HTML and submit proposed changes (but not for all six HTML outputs). This defect should remain to handle the font problems.
D*mn, forgot to reopen.
I have written defects 14600, 14601, 14602, and 14603 to deal with additional issues that have been brought up in this defect.
The font problem does not happen in OOo 1.1 beta2.