Issue 101552

Summary: enable dynamic glyph fallback under Windows system
Product: gsl Reporter: jiayanmin
Component: codeAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@gsl <issues>
Severity: Trivial    
Priority: P3 CC: hdu, issues, khirano, maho.nakata, outshade, zhf
Version: OOo 3.0   
Target Milestone: OOo 3.3   
Hardware: PC   
OS: Windows, all   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on: 97086    
Issue Blocks: 104940, 109067, 110205    
Attachments:
Description Flags
non-bmp chinese characters represent in dialog
none
a patch for dynamic font fallback on windows
none
Another way to implement dynamic font fallback on Windows by traversing ImplDevFontList
none
A new implementation
none
the implementation of 2-level font fallback algorithm
none
update of LangFromCodeChart in salgdi3.cxx none

Description jiayanmin 2009-05-05 07:31:58 UTC
Under windows system, OOo only uses a static font list (pls refer to function
ImplDevFontList::InitGenericFallback in vcl/source/gdi/outdev3.cxx) for glyph
fallback. Evidently, it's impossible that the static font list contains all the
fonts included in all the user's windows system. So in some circumstances, the
missing characters cann't match a right font even the system has that font. For
example, non-bmp Chinese characters can not present in a dialog box even there
is a font in the system that contains those characters.

Suggestion: improve the glyph fallback mechanism and support dynamic glyph
fallback, as done in linux leveraging fontconfig, develop a class
WinGlyphFallbackSubstitution inherited ImplGlyphFallbackFontSubstitution.
Comment 1 philipp.lohmann 2009-05-05 09:46:56 UTC
reassign
Comment 2 hdu@apache.org 2009-05-05 14:03:16 UTC
confirmed
Comment 3 jiayanmin 2009-05-06 02:22:38 UTC
When would a CWS be generated for this issue. I think this enhancement could
help decrease many text representation complaints from users. In addition, a
large improvement has been made  on Linux by leveraging fontconfig lib.
Comment 4 jiayanmin 2009-05-18 06:56:11 UTC
Does the enhancement go into the plan of incoming upgrade of OOo?
Comment 5 hdu@apache.org 2009-05-18 08:13:14 UTC
I'm booked out for a while, sorry.
Comment 6 jiayanmin 2009-05-19 02:41:17 UTC
I have tried to implement a class WinGlyphFallbackSubstitution inheriting from
ImplGlyphFallbackFontSubstitution in vcl/win/source/gdi/salgdi3.cxx as the class
FcGlyphFallbackSubstitution using fontconfig on linux platform. And the Windows
API EnumFontFamiliesExW was used to enumerate fonts in Windows system. But the
work was blocked because I failed to find a win API to determine if the missing
characters contained by a specified font on. So I'm not sure if this feature
could be add on? 
Comment 7 hdu@apache.org 2009-05-19 08:07:43 UTC
Indeed, there are no public GDI-APIs that work on all of our supported Windows platforms that provide 
this info directly. But as I wrote in http://gsl.openoffice.org/servlets/BrowseList?
list=dev&by=thread&from=2225826 we can still get all the information we need to implement this 
enhancement: Get each font file's CMAP table (either by using GDI's GetFontData() or by calling VCL's 
UpdateFromHDC()) and then parse it using VCL's ParseCMAP()  function.
Comment 8 jiayanmin 2009-05-19 11:01:38 UTC
yanminjia->hdu: If there is no technical problem to block the implementation,
Performance would be the main concern. Specially, the performance of OOo has
been criticized for so many users. :) 
Comment 9 hdu@apache.org 2009-05-19 12:48:41 UTC
Yes, the performance impact of this enhancement would be a major concern as I already wrote in http://gsl.openoffice.org/servlets/ReadMsg?list=dev&msgNo=2276 Other than concerns like that it is 
probably not too difficult to implement a solution that makes almost all use cases behave better.
Comment 10 jiayanmin 2009-05-22 10:20:28 UTC
Created attachment 62432 [details]
non-bmp chinese characters represent in dialog
Comment 11 jiayanmin 2009-05-22 10:22:42 UTC
Created attachment 62433 [details]
a patch for dynamic font fallback on windows
Comment 12 jiayanmin 2009-05-22 10:33:56 UTC
yanminjia->hdu: Many thanks for your suggestion. I implemented a class
WinGlyphFallbackSubstitution in vcl/win/source/gdi/salgdi3.cxx. It can work
indeed. Please see the picture I submitted. Maybe it's the first time that
non-bmp Chinese characters represent in OO dialog correctly on windows.:) But
the performance is impacted as expected. The source code snippet is just a naive
experiment. Would you please take a look and give me more suggestions? Thank you.
Comment 13 hdu@apache.org 2009-05-25 13:08:30 UTC
@yanminjia: it works and that is a good start!
Here are some more hints:
1. each SalGetSubstituteFontProcExW step leaks an ImplWinFontData item
2. these ImplWinFontData items are already known in the ImplDevFontList
3. these ImplWinFontData items probably have their ImplFontCharMap already cached, so setting the 
font and getting the coverage over and over can be avoided
4. since the set of fonts on the system is mostly constant the expensive enumeration loop to determine 
the glyph coverage can be avoided for most of the fallbacks after the first one
5. for glyph-fallback we want to make sure that the fallback glyphs are valid. Some fonts claim 
coverage for a glyph, but provide empty ones or ones that look like the notdef glyph. Unless the font is 
known to be good an additonal check is probably needed.
6. the style of the fallback glyph should try to match the style of the font originally requested. E.g. the 
fallback for a bold oblique glyph should be bold oblique too if possible
... more later

So there is a lot of work ahead of us to get this done properly.
For the urgent problem you mentioned (non-BMP CJK glyphs) there is an easy workaround that gives us 
time: we should find out which fonts were used to resolve the urgent problematic cases; then add these 
fontnames to the aGlyphFallbackList[] list in vcl/source/gdi/outdev3.cxx
Comment 14 jiayanmin 2009-05-26 03:01:40 UTC
@hdu: Many thanks to your suggestions. 

For your point 2 and 3, I will try another way to implement the function
WinGlyphFallbackSubstitution::FindFontSubstitute by traversing ImplDevFontList
as you suggested.

As your point 4, 5 and 6, There is really a lot of work to enable the text
output ever smarter with the dynamic font fallback. I think the goal is much
clear: represent text with the most appropriate font hosted by the operating system.

Comment 15 jiayanmin 2009-06-08 09:25:23 UTC
Created attachment 62844 [details]
Another way to implement dynamic font fallback on Windows by traversing ImplDevFontList
Comment 16 jiayanmin 2009-06-08 09:34:07 UTC
@hdu: I have just implemented dynamic glyph fallback with another way by
traversing each ImplWinFontData item in ImplDevFontList as your suggestion.
ImplDevFontList indeed contains ImplWinFontData item, but unfortunately there's
no ImplFontCharMap cached in each ImplWinFontData. Would you please take a look
the patch I submitted and give me more suggestions again? Thanks.
Comment 17 hdu@apache.org 2009-06-08 16:17:58 UTC
@yanminjia: It looks much better, thanks!
It is still very expensive though:
- unfortunately the GetDevFontList() call is relatively expensive in this context. It does way to much 
copying. For the existing use cases it just was not worth it to do more sharing => I suggest to use the ImplDevFontList directly (knowing that all entries there are WinFontData objects) and ignoring any non-
scalable entries
- the first part of the loop in FindFontSubstitute() where it gets the font face details (by calling UpdateFromHDC()) is not needed when it is already known for that WinFontData object. Testing if the 
details are already known maybe deserves a helper function. Or it can be done by calling  
ImplWinFontData::GetImplFontCharMap() which has been modified a bit to also allow a NULL return
Comment 18 jiayanmin 2009-06-09 08:28:38 UTC
@hdu: Many thanks for your suggestions. 

I use UpdateFromHDC() just for getting CMAP data. I think the function
ReadCmapTable() may be more appropriate to do the same job, but it's a private
member function of class ImplWinFontData, so can't be accessed from external
calling. Anyway it's a good idea to test if ImplWinFontData object know the CAMP
before reading it.
Comment 19 hdu@apache.org 2009-06-09 16:37:49 UTC
Yes, using UpdateFromHDC() when we just need the codepoint coverage is fine. Almost all of its cost is 
the CMAP loading and parsing anyway. Avoiding it if the coverage is already available is important though.
Comment 20 jiayanmin 2009-06-12 03:17:20 UTC
Created attachment 62929 [details]
A new implementation
Comment 21 jiayanmin 2009-06-12 03:28:24 UTC
@hdu: I submitted a new implementation which traverses WinFontData objects by
using ImplDevFontList directly as your suggestions. And before read CMAP table,
ImplWinFontData::GetImplFontCharMap is used to test if CMAP data is available.
It works much better. Thank you. And now I don't think performance is still a
concern. Maybe sometimes it likes that a river passes a small stone for user
experience. Would you please take a look again? 
Comment 22 hdu@apache.org 2009-06-17 09:56:32 UTC
I didn't have time for more than a first look, but the latest patch is quite good now, thanks!

I'd like some assurance from automatic testing, from performance testing (e.g. when many fonts are 
installed), the empty glyph problem (item 5 in desc#14), etc. Or maybe we should just start by enabling it 
for our CJK-Locales only in OOo3.2 and other locales later... Anyway, this change deserves its own CWS.
Comment 23 jiayanmin 2009-06-19 06:25:53 UTC
Many thanks. I have not opened a CWS in OOo development. Would you please give
me some tips? Actually I don't know what I should do next. 
Comment 24 hdu@apache.org 2009-06-19 10:35:11 UTC
I created the CWS gfb4win02 and will check in the latest version of your patch.
Please see http://wiki.services.openoffice.org/wiki/OOo_and_Subversion to check out this CWS 
(ssh://svn@svn.services.openoffice.org/ooo/cws/gfb4win02)
Comment 25 hdu@apache.org 2009-06-19 15:28:23 UTC
Applied in CWS gfb4win02 with a minor fixes to prevent a side effect:
the availability of the unicodemap currently also means that all the interesting details from the 
matching HDC are known, so we should use UpdateFromHDC() instead of calling directly 
ReadCmapTable()

I also added a big TODO: we are currently checking if the font face can resolve all missing codepoints. 
This approach is is problematic for some corner cases, e.g.
- user types some text that can be resolved by fontA
- user types another character that cannot be resolved by fontA but only by fontB
=> then the whole text switches from fontA to fontB, linebreaking changes etc.
This scenario is already bad enough, but it starts to get really hairy when fontB cannot resolve all other 
codepoints

And then used FindBestFontFace() instead of iterating over the font faces manually. This not only helps 
with finding a better stylistic match but it also avoids leaking WinGlyphFallbackSubstititution into one of 
the central header files of the platform-independent part of vcl.
Comment 26 jiayanmin 2009-06-22 06:34:31 UTC
My implementation is the simplest solution which only focus on resolving missing
codepoints. A better font match algorithm in accordance of style really makes
sense of text output quality. Maybe a windows-specific font managing library
simiar to fontconfig should be developed, but it would cost a large amount of
efforts. :)
Comment 27 hdu@apache.org 2009-06-29 08:50:27 UTC
The CWS is available under the branch svn://svn@svn.services.openoffice.org/ooo/cws/gfb4win02/
This issue is still on my radar but I don't know yet when I get around to testing it. In the meantime I'd like 
to collect some feedback from third parties. Is anyone else testing this too?
Comment 28 jiayanmin 2009-08-20 10:17:49 UTC
@hdu:I see gfb4win02 still pending on test. I'm not sure if I can take some
actions to push the testing process.:)
Comment 29 hdu@apache.org 2009-08-20 10:56:01 UTC
@yanminjia: my other CWSses like graphite01, otf01, vcl104 need a lot of attention currently...
Comment 30 jiayanmin 2009-09-23 07:59:07 UTC
@hdu: In consideration of performance and text output quality, I designed a 2-
level font fallback algorithm for windows platform based on the implementation 
before. Pseudo-code below gives a great simple description:

input:  missing string
output: font face

//First level fall back, be fast and cover most cases
determine the language of missing string
get UI default font face df with language type of missing string 

if df contains the characters of missing string
      return df;

//second level fall back, slow but only a few cases need 
for each fontface in ImplDevFontList {
	if fontface contains the missing string
		return fontface;
	else
		next;
}

The advantages of the above algorithm is,

1. 1st level fall back is quick, and a approriciate font can be got for the 
missing characters in most circumstances. 

2. Though 2nd level fall back seems awkward and stupid, but useful in a very few 
circumenstances. For example, some non-bmp Chinese characters can be presented 
in controls with 2nd level fallback.

several experiments show it really work well. I will submit a patch after the 
implementation being refined. 
Comment 31 jiayanmin 2009-11-17 07:42:27 UTC
Created attachment 66148 [details]
the implementation of 2-level font fallback algorithm
Comment 32 jiayanmin 2009-11-17 07:53:28 UTC
@hdu: The latest implementation totally changed the way of font fallback under
windows. It really works well and be integrated into code stream of symphony
development. In most scenarios, the best fontface could be got from the fallback
algorithm. What's more, the font fallback algorithm can be refined by update the
font configuration file vcl.xcu. Performance also can be benefited from the new
implementation. Would you please take a look the patch I just deliverd? Thanks.
Comment 33 hdu@apache.org 2009-11-17 10:03:53 UTC
It looks very good, thank you!
I think it is ready for a OOo330 target.
Comment 34 hdu@apache.org 2010-03-05 10:00:23 UTC
The patch is now committed into CWS gfb4win02.
There were a few remaining problems:
A. the first level fallback always defaulted to ZHS glyphs even for the unified CJK Unified Ideographs
B. the second level fallback treated fonts with unknown CMAP types as match
C. the fallback font depends on the first missing char which can easily result in layout instabilities e.g. 
when loading/reloading a document or when scrolling, selecting, zooming, etc.
D. the fallback mechanism has problems with the lifetime of its device font list
E. the use of OutputDevice layer methods in the SalGraphics layer may be problematic

I already fixed A and B in the CWS and will look into C and D. We can live with E for now but it needs to 
be carefully reviewed that it doesn't introduce problems such as issue 108914.

Regarding A (unified CJK ideographs) I am not yet sure if the ranges that I changed from 
LANGUAGE_CHINESE_SIMPLIFIED to LANGUAGE_DEFAULT_CJK are the correct ones: U+3000..U+303F, 
U+31C0.U+31EF, U+3400..U+4DBF, U+4E00..U+9FCF, U+F900..FAFF, U+20000..U+2A6DF and 
U+2F800..U+2FA1F. For the language detection in glyph fallback these ranges get mapped back to the 
default CJK language.
@yanminjia,@khirano,@maho: Are the glyphs used in these unicode ranges locale specific or do they 
provide hints regarding the language (e.g. like U+3040..U+30FF indicate Japanese)? Are the ranges 
mentioned above sufficient or did we need more?
Comment 35 jiayanmin 2010-03-09 01:49:31 UTC
@hdu: I'm not sure if there is a node for certain language to match with 
LANGUAGE_DEFAULT_CJK in vcl.xcu. if no such a node available, none font could be 
hit when the missing characters is in the range mapped back to 
LANGUAGE_DEFAULT_CJK. Actually, Few font can cover all the characters of CJK 
which are so many that uses the most part of Unicode encoding space. :)

As known well, there is not a accurate map from language to the block of Unicode 
encoding space. So the map I composed is just to meet the requirement of font 
fallback in OOo in accordance of the config file vcl.xcu.

I have made a little change. Please see the attachment I just submitted.

Comment 36 jiayanmin 2010-03-09 01:52:53 UTC
Created attachment 68235 [details]
update of LangFromCodeChart in salgdi3.cxx
Comment 37 hdu@apache.org 2010-03-09 10:47:18 UTC
The idea with the LANGUAGE_DEFAULT_CJK entries is that their language gets dynamically replaced by the 
preferred CJK language. Either the UI-language or the language selected in the registry key
\\\HKEY_LOCAL_MACHINE\SYSTEM\\CurrentControlSet\\Control\\Nls\\Language\\Default
It would be best if the language from the Tools->Options->Languages->DefaultCJK lang could be used, 
but this is not yet available in the SalGraphics layer.

By the way, the current version of saldi3.cxx in CWS gfb4win02 can be seen in
http://hg.services.openoffice.org/cws/gfb4win02/file/08d71e4c61e4/vcl/win/source/gdi/salgdi3.cxx
the mercurial service does unfortunately not provide a way to link to the tip of the CWS repository.
Comment 38 jiayanmin 2010-03-10 05:47:51 UTC
I think your idea can help to determine the language of the unified CJK missing
characters more accurately. That's great. The language of the characters in the
blocks (such as U+3400..U+4DBF) you mensioned can not be known only from the
value of the code point, though they are more biased to be comprised in a
Chinese document. :) Thank you.
Comment 39 jiayanmin 2010-03-10 05:48:17 UTC
@hdu: I think your idea can help to determine the language of the unified CJK
missing characters more accurately. That's great. The language of the characters
in the blocks (such as U+3400..U+4DBF) you mensioned can not be known only from
the value of the code point, though they are more biased to be comprised in a
Chinese document. :) Thank you.
Comment 40 jiayanmin 2010-03-10 07:18:39 UTC
@hdu: I think your idea can help to determine the language of the unified CJK
missing characters more accurately. That's great. The language of the characters
in the blocks (such as U+3400..U+4DBF) you mensioned can not be known only from
the value of the code point, though they are more biased to be comprised in a
Chinese document. :) Thank you.
Comment 41 hdu@apache.org 2010-03-24 14:38:37 UTC
In CWS gfb4win02 all points have been addressed. Item E still worries me a bit but as I mentioned it is 
more than good enough for now.
Comment 42 hdu@apache.org 2010-03-25 11:26:54 UTC
@sba: please verify in CWS gfb4win02. Changed behaviour is only expected on WIN platforms.
Comment 43 hdu@apache.org 2010-04-08 12:52:01 UTC
.
Comment 44 stefan.baltzer 2010-04-21 13:10:26 UTC
Verified in CWS gfb4win02. 
Best user scenario on my machine:
 - New Writer doc
 - Insert some characters from U+3400 upwards (use i.e. font "SunBatang")
 - select all and set them to another font without CJK characters
-> Chosen font for replacement is much better than before
Comment 45 hdu@apache.org 2010-09-17 12:51:30 UTC
*** Issue 106798 has been marked as a duplicate of this issue. ***