Apache OpenOffice (AOO) Bugzilla – Issue 28241
Greek characters as HTML entities
Last modified: 2013-08-07 14:41:36 UTC
The greek characters in export as HTML are saved as HTML entities, which is wrong (according to w3c).
Created attachment 14702 [details] patch to resolve the problem
can you point out the w3c url for reference for this issue please. Thanks
MRU->ES: pls evaluate...
ES->MIB: can you handle this patch, please.
First of all, I assume you are refering to section "24.3 Character entity references for symbols, mathematical symbols, and Greek letters" of the HTML 4.01 specification (http://www.w3.org/TR/html4/sgml/entities.html). I also agree that using entities for greek characters is not reasonable if greek encodings are used. I had a look at your patch, and want to thank you for sending it to us. However, it seems to me that the patched code is not very efficient, because it contains many string comparisons. For this reason, I would suggest to 1) Add a parameter to the function "lcl_svhtml_GetEntityForChar" that contains the current encoding. 2) Divide the switch statement contained in the function "lcl_svhtml_GetEntityForChar" into two, one that contains the non-greek characters only, and one that contains the greek characters only. The 2nd switch statement should be executed only for non-greek encodings. Do you think it would be possible for you to send us an updated patch?
Hi, first of all, I saw I sent by accident the wrong patch. The correct one is exactly the same (codewise) but also has credits. It's based on the patch by Pavel JanÃk for ISO_8859_2 characters found at: http://puma.feld.cvut.cz/~pavel/ Second, you are right, it's inefficient, but the solution you suggest, would mean changes to the lcl_svhtml_GetEntityForChar prototype. Wouldn't that suggest bigger changes throughout the program? Greek characters are consecutive. Could we do simple arithmetics on a switch . i.e if greater than 0386 and less than 03CE (some blanks exist in there, but you get the point), just do nothing (pStr=0). Do you think that would work?
Hi, we are coming closer to 2.0 and we would like to clean up the issues with target milestone 2.0. 'izone', are you going to work on this patch? Michael (mib) will not see your comments. You may want to ask him at the dev@sw.openoffice.org list. Greetings, Stefan
Hi, Well, I thought I'ld get an answer to the question I asked on June 15th first!!! But I'll try to come up with something to see if it is ok.
Hi, I attach a zip file, containing two patches. The one named htmlout.patch, applies to 1.1.4 source. The one named htmlout-2.patch applies to CVS Head (updated 14/1/2005) They were created by Alexios Zavras and tested by Dimitrios Korbetis and they solve the problem with the Greek characters on HTML export.
Created attachment 21543 [details] Patches for the file svtools/source/svhtml/htmlout.cxx
Hi, let me reassign the issue to Michael (mib) for review of the new patch...
Patch as been manually applied.
. re-open issue and reassign to es@openoffice.org
reassign to es@openoffice.org
reset resolution to FIXED
Verified in cws fwkfinal1
Ok in src680m89