Apache OpenOffice (AOO) Bugzilla – Issue 2144
Export of non latin1 characters in HTML format broken
Last modified: 2008-05-18 00:00:33 UTC
So, the problem is that is WYSIWG part of HTML editor all the words typed in Latvian language are displeyd correctly, but when I save the html file and then look at the source, the special characters like â,?,? are translated by the progrm like &d5r2,&wr32 (just an example) instead of normal iso8859-13 encoded characters. so resulting text of html document in my native language (latvian) also doesn`t look like it should. Bye
Reassigned to Éric.
*** Issue 1810 has been marked as a duplicate of this issue. ***
ES->FLO: please evaluate the effort and targetted milstone. Notice: we have maybe 2 solutions to displays these charcters: - The human readable one: display the character as is in the code the way an UTF-8 export allows it. - The HTML complient one: work with named entities (possible?). In this case we could have instad of a 'â' a 'â' (which correspond in iso-8859-1 to a circumflex accent) which the encoding tag "charset=iso-8859-13" would transform to a "latin small letter a with macron".
Currently the HTML export is completely unusable for non Latin-1 charsets. Please try to fix it. I think that the correct way should be to export it in the utf-8 encoding without these stupid named entities and that thing you've specified (to use named entities from Latin-1 with the same character code) wouldn't work anyway. If someone wants to have it's pages in different encoding he can use some external tool to convert utf-8 to his preffered encoding. And I think that the correct utf-8 export should be very easy to do.
If you want default UTF-8 export just say so under menu /Tools/Options/LoadSave/HTML_Compatibility and select character set Unicode (UTF-8). If you want a different encoding just select it, you don't even need external tools for that.
OK. I didn't know about the setting. But it could be better anyway - it should export all characters as utf-8 but it exports characters which are the same in latin-1 and latin-2 (or maybe have the named entities) as named entities (aacute, iacute) but it's completely unnecessary. If some browser supports the utf-8 encoding it should display them fine even if they would be as utf-8 character and not named entity. And it doesn't make the source partially readable. But the export to other charsets (OK I've tried only the latin-2 and windows-1250) is broken completely.
FL: Please see latest comment from Tomas Mraz "But the export to other charsets (OK I've tried only the latin-2 and windows-1250) is broken completely." and clarify issue. (UTF-8 and named entity are not the problem)
ES->Artis & Tomas: please check if you still have problems with a current build. If yes: - describe step by step what you do (which settings, encoding), how you save etc. Notice if you see any error message - attach a sample file (!!! but zipped if it is an HTML file because IssueZilla destroys HTML docs !!! ) or provide an URL to this file. - Reassign to me If notr: close the issue
Thanks a lot! Good work!
The Issue you raised has been marked as 'Resolved' and not updated within the last 1 year+. I am therefore setting this issue to 'Verified' as the first step towards Closing it. If you feel this is incorrect, please re-open the issue and add any comments. Many thanks, Andrew Cleaning-up and Closing old Issues ~ The Grand Bug Squash, pre v3 ~ http://marketing.openoffice.org/3.0/announcementbeta.html
As per previous posting: Verified -> Closed. A Closed Issue is a Happy Issue (TM). Regards, Andrew