Issue 16900 - XHTML Export using wrong encoding for Hebrew
Summary: XHTML Export using wrong encoding for Hebrew
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: BiDi (show other issues)
Version: OOo 1.1 RC
Hardware: PC Windows 2000
: P3 Trivial (vote)
Target Milestone: ---
Assignee: jogi
QA Contact: issues@l10n
URL:
Keywords: Hebrew, oooqa
Depends on:
Blocks:
 
Reported: 2003-07-15 20:38 UTC by kosherjava
Modified: 2013-08-07 15:00 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Original file (4.86 KB, application/octet-stream)
2003-07-15 20:47 UTC, kosherjava
no flags Details
xhtml file (renamed to .html) (1.36 KB, text/html)
2003-07-15 20:48 UTC, kosherjava
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description kosherjava 2003-07-15 20:38:46 UTC
Hi,
I just tried out OOo for the first time and am quite impressed I am using 1.1 RC
on win2k. 
Hebrew support worked well but if not for documentation at
http://www.openoffice.org.il/arabic_hebrew_howto_OS_643.txt I would have been
lost. this belongs in the regular help files.
Back to the issue at hand. I typed a small hebrew doc and tired saving it in the
native and .doc formats. all was well. Same went for saving it as HTML.
Exporting to PDF worked but exporting to XHTML did not produce legible html.
Any ideas?
Comment 1 kosherjava 2003-07-15 20:47:22 UTC
Created attachment 7715 [details]
Original file
Comment 2 kosherjava 2003-07-15 20:48:06 UTC
Created attachment 7716 [details]
xhtml file (renamed to .html)
Comment 3 kosherjava 2003-07-15 21:11:11 UTC
After aditional testing I noticed that it does work in Mozilla
Firebird 0.6 (I assume that works on other mozilla builds as well). It
does not display properly in IE (or NS 4.x). Maybe it's an IE
"feature", but the regular save as html does properly display in IE.
It seems that the problem is the lack of adding
<meta http-equiv="Content-Type"
 content="text/html;charset=ISO-8859-8-I"> to the html. Adding it
fixed the problem. (as does adding <META HTTP-EQUIV="CONTENT-TYPE"
CONTENT="text/html; charset=windows-1252"> that is used in the regular
save as html)
Comment 4 sforbes 2003-07-28 15:52:36 UTC
How exactly are you saving the file? And what do you have under
prefrences-->save/load-->html-->character set?

I got three difrrent results depending on how I save the file as xhtml.

The closest I got  was having it saved as .xhtml instead of .html,
makign both mozilla and ie to display the content inline.

After renaming the file, it worked fine in both IE and mozilla, and
the file was currectly encoded as unicode, as xhtml should be.

tested with rc2 on my windows 2003.
Comment 5 kosherjava 2003-07-29 14:09:54 UTC
Hi,
Under Options > Load/Save > HTML Compatibility I have "Western Europe 
Windows 1252".
Since the document properties were set to Hebrew I would expect that 
exporting (I dont't think there is a save as xhtml) the document to 
xhtml would properly set the proper xhtml setting to hebrew (as the 
save as html option properly does) even though my default html prefs 
were to the default Windows char set.
Does the html document attached display properly in your IE?

That aside is there any reason there is no save as xhtml? why is it 
in the "Export" menu?
Comment 6 Dieter.Loeschky 2003-08-20 11:19:42 UTC
DL->MIB: Would you please takeover?
Comment 7 michael.brauer 2003-08-20 11:56:57 UTC
Svante has fixed this bug already, but not committed it to any
realease. I think we should consider to place the bugfix in OOo 1.1.1
Comment 8 michael.brauer 2003-08-21 11:31:34 UTC
.
Comment 9 lo 2003-09-01 13:58:50 UTC
accepted
Comment 10 lo 2003-10-06 13:03:42 UTC
XHTML export is an export filter based on an XSL transformation
(Tools/XML Filter Settings...) while HTML export/saving is implemented
as an internal filter that is accessing the internal document
representaion not the xml representation.
Settings in the Load/Save options don't have an effect on this XHTML
export.
Although XHTML does not reqiure a meta-tag to select an encoding
(which is UTF-8 in this case) IE seems to need it since it seems not
to use the <?xml encoding="UTF-8"> element at the start of the xhtml file.
Comment 11 lo 2003-10-23 13:34:47 UTC
will be integrated with a other xhtml enhanchments in 1.1.2
Comment 12 svante.schubert 2003-11-21 11:30:35 UTC
Lars this bug will be fixed with my update of XHTML XSLT stylesheets
(gonna add a link for beta download ASAP).
Comment 13 svante.schubert 2003-11-21 11:31:26 UTC
fixed by adding meta tag
Comment 14 svante.schubert 2004-01-27 13:22:12 UTC
In a patch only changes on the exisiting documents will be commited. In our case
the whole filter (stylesheets) have been refactored and overworked, so I changed
target to OOo2.0.
Comment 15 svante.schubert 2004-01-27 13:25:29 UTC
Maybe aside of the explicit UTF-8 encoding via meta-file a new 'dir' (direction)
attribute have to be added. Due to my lack of knowledge about hebrew I am not
able to validate this issue, so I gonna add the stylesheets, which the submitter
might test.
If it fails, we should file a follow up task, as some enhancements concerning
this issue already have been implemented.

gzip file attachement containing the stylesheets will follow...
Comment 16 svante.schubert 2004-04-20 14:19:08 UTC
SUS:Reassigned to the QA
Comment 17 jogi 2004-04-23 09:41:53 UTC
I also can not validate it but we have fixed the META tags, so it should work now.
Comment 18 jogi 2004-04-23 09:42:22 UTC
blind verified.
Comment 19 kosherjava 2004-04-23 16:00:45 UTC
Short of updating OOo is there any way you can  attach the latest xhtml output 
based on the original file (why.swx) that I attached?
Thanks
Comment 20 jogi 2004-04-26 06:12:07 UTC
Sorry, the XHTML filter is based on more than one file and we have that
behavouor not specified in the 'XML Filter Settings' feature. So I can not
create a .jar file for you to test it. I will write down here the milestone in
which it could be tested, okay?
Comment 21 jogi 2004-07-19 11:36:41 UTC
SHould be okay in SRC680m48 but can not test hebrew.