Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | HTML import mangles non-BMP unicodes | ||||||
---|---|---|---|---|---|---|---|
Product: | Writer | Reporter: | hdu <hdu> | ||||
Component: | open-import | Assignee: | AOO issues mailing list <issues> | ||||
Status: | CONFIRMED --- | QA Contact: | |||||
Severity: | Trivial | ||||||
Priority: | P3 | CC: | damjan, issues | ||||
Version: | OOO310m14 | ||||||
Target Milestone: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Issue Type: | DEFECT | Latest Confirmation in: | 4.2.0-dev | ||||
Developer Difficulty: | --- | ||||||
Issue Depends on: | |||||||
Issue Blocks: | 102943 | ||||||
Attachments: |
|
Description
hdu@apache.org
2009-07-03 08:30:23 UTC
Created attachment 63344 [details]
bugdoc
Fixing the method "sal_Unicode CSS1Parser::GetNextChar()" in sw/source/filter/html/parcss1.cxx is probably a good starting point. set target Reset assigne to the default "issues@openoffice.apache.org". (In reply to hdu@apache.org from comment #2) > Fixing the method "sal_Unicode CSS1Parser::GetNextChar()" in > sw/source/filter/html/parcss1.cxx is > probably a good starting point. Yes but that's just CSS parsing, the remainder of the HTML parsing is in main/svtools/source/svhtml/parhtml.cxx, which, sadly like most of our codebase, also operates one Unicode code unit at a time, retrieved from SvParser::GetNextChar(). The function inline sal_uInt16 GetCharSize() const; got my hopes up, does it tell us the code point size? inline sal_uInt16 SvParser::GetCharSize() const { return (RTL_TEXTENCODING_UCS2 == eSrcEnc) ? 2 : 1; } No, just the bytes per BMP character for the current encoding, a useless statistic. SvParser does not have any functions for code points. We'd have to add them and change a lot of code - not just HTML parsing - to use them. |