Apache OpenOffice (AOO) Bugzilla – Issue 22579
incorrect import : HTML page with CKJ characters coded in hexadecimal
Last modified: 2023-01-04 14:10:57 UTC
incorrect import : HTML page with CKJ (Chinese Korean Japanese) characters coded in hexadecimal. This problem affects maybe all the program of OpenOffice.org : Writer, Spreadsheet, Presentation, Draw, HTML writer... CKJ can be coded in two ways, for example : あ, decimal あ, hexadecimal あ. When import in Writer or Spreadsheet : For decimal code, the character is imported correctly. But for hexadecimal code, the HTML code is imported. I'll post a zip file, there are HTML files for test.
Created attachment 11352 [details] HMTL files for test. md5 signature : da3d80ef80f1c010e68755b20aae48a6
Hi, as this is not only a calc problem but one of the edit engine, I change the component to framework and re-assign it to the appropriate developer. Frank
Henning...
accepted
Seems that it affects not only CKJ but all characters (ASCII, accentued, CKJ,... ) coded in hexadecimal in HTML pages.
Reset assigne to the default "issues@openoffice.apache.org".
(In reply to lcn from comment #5) > Seems that it affects not only CKJ but all characters (ASCII, > accentued, CKJ,... ) coded in hexadecimal in HTML pages. All 3 sample documents look the same now, and my tests show hexadecimally coded ASCII (eg. Z for "Z") look right. Please confirm whether this is still an issue? I believe the parsing happens in HTMLParser::ScanText() in main/svtools/source/svhtml/parhtml.cxx, and it supports both hex and decimal encoding.