Issue 103308 - HTML import mangles non-BMP unicodes
Summary: HTML import mangles non-BMP unicodes
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOO310m14
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Depends on:
Blocks: 102943
  Show dependency tree
Reported: 2009-07-03 08:30 UTC by
Modified: 2017-05-20 11:18 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---

bugdoc (1.43 KB, text/html)
2009-07-03 08:33 UTC,
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description 2009-07-03 08:30:23 UTC
The HTML filter uses the 16bit type sal_Unicode for all its text processing needs and so it strips of the 
most significant bits of unicodes beyond the baseplane. This results in a mangled import.
Comment 1 2009-07-03 08:33:06 UTC
Created attachment 63344 [details]
Comment 2 2009-07-03 08:38:13 UTC
Fixing the method "sal_Unicode CSS1Parser::GetNextChar()" in sw/source/filter/html/parcss1.cxx is 
probably a good starting point.
Comment 3 openoffice 2009-07-03 08:54:58 UTC
set target
Comment 4 Marcus 2017-05-20 11:18:15 UTC
Reset assigne to the default "".