Issue 103308 - HTML import mangles non-BMP unicodes
Summary: HTML import mangles non-BMP unicodes
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOO310m14
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks: 102943
  Show dependency tree
 
Reported: 2009-07-03 08:30 UTC by hdu@apache.org
Modified: 2017-05-20 11:18 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
bugdoc (1.43 KB, text/html)
2009-07-03 08:33 UTC, hdu@apache.org
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description hdu@apache.org 2009-07-03 08:30:23 UTC
The HTML filter uses the 16bit type sal_Unicode for all its text processing needs and so it strips of the 
most significant bits of unicodes beyond the baseplane. This results in a mangled import.
Comment 1 hdu@apache.org 2009-07-03 08:33:06 UTC
Created attachment 63344 [details]
bugdoc
Comment 2 hdu@apache.org 2009-07-03 08:38:13 UTC
Fixing the method "sal_Unicode CSS1Parser::GetNextChar()" in sw/source/filter/html/parcss1.cxx is 
probably a good starting point.
Comment 3 openoffice 2009-07-03 08:54:58 UTC
set target
Comment 4 Marcus 2017-05-20 11:18:15 UTC
Reset assigne to the default "issues@openoffice.apache.org".