Issue 12317

Summary: Generated HTML changes default font and spacing
Product: Writer Reporter: Unknown <non-migrated>
Component: codeAssignee: eric.savary
Status: CLOSED IRREPRODUCIBLE QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P3 CC: issues
Version: OOo 1.0.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows NT   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Word file and five HTML files generated from same
none
MS Office conversion to HTML none

Description Unknown 2003-03-13 18:58:45 UTC
(generated from file attached to 12294)
The generated HTML file changes the default font from Arial to Times New Roman,
and increases the line spacing (especially noticable in the Roll Call section at
the top).
Comment 1 Unknown 2003-03-17 22:02:33 UTC
The roll call is also left justified, instead of being indented.
Comment 2 eric.savary 2003-03-18 19:26:57 UTC
Please add a sample file (if it's an HTML file, zip it because
IssueZilla destroys them).
What do you call "the rall call"???
Comment 3 Unknown 2003-03-19 02:27:41 UTC
Changing the export type from Netscape 4.x to HTML 3.2 fixed the font
problem (it is now Arial) but the spacing problem remains, and the
Page/Item text is now formatted even worse than it was when saving
with Netscape 4.x compatibility (which didn't match the document either).

The "roll call" is section 1.A of the document referred to at the
beginning of comment one.

I was hoping that you could generate whatever you needed from the
attachment to issue 12294, but I will include a zip file with the Word
document, and all five versions of HTML that OOwriter can output.

There are quite a few problems with how the HTML appears in any given
browser, let me know how far I should break the problems down.  I
don't want to write a bunch of issues and have them considered duplicates.
Comment 4 Unknown 2003-03-19 02:29:33 UTC
Created attachment 5142 [details]
Word file and five HTML files generated from same
Comment 5 Unknown 2003-03-19 02:37:24 UTC
The IE4, NS4, and OO HTML versions use style sheets which get the font
wrong (Times instead of Arial).
All versions space the document correctly, see section 1.A. of the
original document.
All versions throw extra text at the bottom, which seems to be a
random number.  The 3.2 HTML and IE3 versions turn the number in to a
GIF and embed that at the bottom of the page.
All versions poorly indent the page-format column in section 2.A. of
the original document.
Tables are a overly indented, especially item 7E-1 in the original
document.  This may be due to the illegal nesting of <DL> tags
mentioned in another of my bug reports.
Comment 6 eric.savary 2003-03-24 17:19:32 UTC
> All versions space the document correctly, see section 1.A. of the
> original document.

Tabs (as they exist in the Word document) can't be exporten to HTML
(for they don't exist in HTML).

> All versions throw extra text at the bottom, which seems to be a
> random number.

I didn't see any extra text at the bottom of the page.

> The 3.2 HTML and IE3 versions turn the number in to a
> GIF and embed that at the bottom of the page.

Because those versions of HTML didn't support text frames.

> All versions poorly indent the page-format column in section 2.A. of
> the original document.

Please write an other issue concerning "negative text indent". In
builds earlier than 6xx negative text indents where correctly
displayed. Now they don't get displayed.
That's the problem you have in this section.

> Tables are a overly indented, especially item 7E-1 in the original
> document.

I don't see any table on 7E-1.
Comment 7 Unknown 2003-03-25 06:26:24 UTC
>> All versions space the document incorrectly, see section 1.A. of the
>> original document.
>Tabs (as they exist in the Word document) can't be exporten to HTML
>(for they don't exist in HTML).
I should have said line spacing.  In section 1.A. everything is double
spaced, and in the original document when viewed with OOo or Word the
lines are single spaced.

>> All versions throw extra text at the bottom, which seems to be a
>> random number.
>I didn't see any extra text at the bottom of the page.
HTML 3.2 and IE3 versions have an image element: img
SRC="04022009RG_html_m777f5499.gif" and the GIF is the number 1.  The
word "Frame 1" also appears at the bottom the the rendered HTML.
The oowriter and NS4 versions have the number 49 at the bottom of the
page, and the IE4 version has the number 47.  This is caused by the
number between the start and end SDFIELD tags.

>> Tables are a overly indented, especially item 7E-1 in the original
>> document.
>I don't see any table on 7E-1.
Under section 2.A. (page 4) there is an addendum item for page 43
section 7E-1 that is a Word table.  The column names are 'Nominee',
'Representing', 'Seat No.' and 'Nominated by'.  These columns are all
the way to the left.  The previous Word table (addendum item for page
43 section 7D-1) is pushed all the way to the right.  Is this also the
negative text indent problem?
Comment 8 Unknown 2003-03-25 06:28:57 UTC
Created attachment 5225 [details]
MS Office conversion to HTML
Comment 9 Unknown 2003-03-25 06:41:06 UTC
I have attached what MS Office generates when converting this document
to HTML for comparison purposes.  While this file renders beautifully
in a browser matching the original document exactly, the code is a
mess, doesn't validate, and has an 8K header from hell that is
impossible to reduce.
The goal of this defect (and other defects that may be spawned) is to
have this same quality of rendered output, with valid, simple HTML.
Comment 10 Unknown 2003-03-26 03:04:42 UTC
I also want to point out the original comment, that the HTML version
changes the font of the original document from Arial to Times Roman,
for the versions that use styles (IE4, NS4, and oowriter).
Comment 11 Unknown 2003-03-28 03:16:22 UTC
Negative text indent issue created as #12758.
Comment 12 michael.bemmer 2003-05-09 10:20:57 UTC
Eric, please finish this one.
Comment 13 eric.savary 2003-05-16 16:49:44 UTC
This issue cannot continue any further like this because from 1
problem (summary), we have now 3-4 different problems.
1 problem = 1 issue, please.

But answering your comments:
- Font problems: looks fine on an internal 644m13 (Arial remains
Arial) -> please to report issue, use the current milestones.
- Spacing: cannot be "fixed". Writer uses the "Default" style as...
default doument style. But this style does not exist in HTML and there
is a fallback to the "Txt body" style (which has a double line
spacing) when exporting to HTML.
- The image or frame with a number: on one hand, it does not make sens
in HTML to have page numbers (there's only 1 page) but of course, the
export filters doesn't "know" about that. They just consider a frame
and a field and export it the best they can (following the Netscpae,
IE etc... standards)
- The table of 7E-1: this table is not right aligned in the original
document, it is aligned left but its size reaches the right margin.
The table is still aligned to the left, indented from 0.50cm from the
left and has the same size after exporting to HTML. But the text area
is bigger in HTML because you have no margins and the table looks
"more" aligned to the left than it was.

Conclusion: if you intend to export a document from a word processor
to HTML, you should use a layout compatible with HTML from the beginning.
Avoid tabs (use tables instead), use styles instead of hard
formattings and so on.,,
Comment 14 eric.savary 2003-05-16 16:50:01 UTC
closed
Comment 15 Unknown 2003-05-16 19:06:33 UTC
Please don't close defects as invalid when the problems still exist.
Per my previous comment "There are quite a few problems with how the
HTML appears in any given browser, let me know how far I should break
the problems down.  I don't want to write a bunch of issues and have
them considered duplicates."

Use this defect for the original reported problem, the OOo changes the
default font and spacing, and this has not been fixed.

>looks fine on an internal 644m13
Which output version?  OOo can generate six different types of HTML. 
Does it work for all six versions?  Is this fixed on a released build?

>Spacing: cannot be "fixed".
MS Word seems to handle it fine.  I fixed it manually by removing
<DD><P ALIGN=JUSTIFY> and replacing it with <BR>.  This allowed me to
remove the extra <FONT> tags, since all the names were now one inline
element.

>...it does not make sense in HTML to have page numbers...They just
consider a frame and a field and export it the best they can...

This is not the issue.  The number added to the bottom of the HTML was
never in the original document, and isn't a page number in any case. 
It's and undocumented tag used by OOo.

>The table of 7E-1: this table is not right aligned...

The problem is that the HTML has it aligned to the right: <DIV
ALIGN=RIGHT> instead of to the left as in the original document.

>Conclusion: if you intend to export a document from a word processor
to HTML, you should use a layout compatible with HTML from the beginning.

A nice thought, but there are quite a few documents written before
HTML conversion became possible, or even an idea.  Given that there
are other conversion programs that handle these stumbling blocks, it
would seem the better solution would be to fix the program, rather
then manually reformating thousands of large documents.

I will be writing defects for each item mentioned.  I will also
indicate the offending HTML and submit proposed changes (but not for
all six HTML outputs).  This defect should remain to handle the font
problems.
Comment 16 Unknown 2003-05-16 19:07:37 UTC
D*mn, forgot to reopen.
Comment 17 Unknown 2003-05-19 04:50:38 UTC
I have written defects 14600, 14601, 14602, and 14603 to deal with
additional issues that have been brought up in this defect.
Comment 18 eric.savary 2003-05-19 13:09:31 UTC
The font problem does not happen in OOo 1.1 beta2.
Comment 19 eric.savary 2003-05-19 13:09:49 UTC
closed