Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | enable dynamic glyph fallback under Windows system | ||
---|---|---|---|
Product: | gsl | Reporter: | jiayanmin |
Component: | code | Assignee: | stefan.baltzer |
Status: | CLOSED FIXED | QA Contact: | issues@gsl <issues> |
Severity: | Trivial | ||
Priority: | P3 | CC: | hdu, issues, khirano, maho.nakata, outshade, zhf |
Version: | OOo 3.0 | ||
Target Milestone: | OOo 3.3 | ||
Hardware: | PC | ||
OS: | Windows, all | ||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- |
Developer Difficulty: | --- | ||
Issue Depends on: | 97086 | ||
Issue Blocks: | 104940, 109067, 110205 | ||
Attachments: |
Description
jiayanmin
2009-05-05 07:31:58 UTC
reassign confirmed When would a CWS be generated for this issue. I think this enhancement could help decrease many text representation complaints from users. In addition, a large improvement has been made on Linux by leveraging fontconfig lib. Does the enhancement go into the plan of incoming upgrade of OOo? I'm booked out for a while, sorry. I have tried to implement a class WinGlyphFallbackSubstitution inheriting from ImplGlyphFallbackFontSubstitution in vcl/win/source/gdi/salgdi3.cxx as the class FcGlyphFallbackSubstitution using fontconfig on linux platform. And the Windows API EnumFontFamiliesExW was used to enumerate fonts in Windows system. But the work was blocked because I failed to find a win API to determine if the missing characters contained by a specified font on. So I'm not sure if this feature could be add on? Indeed, there are no public GDI-APIs that work on all of our supported Windows platforms that provide this info directly. But as I wrote in http://gsl.openoffice.org/servlets/BrowseList? list=dev&by=thread&from=2225826 we can still get all the information we need to implement this enhancement: Get each font file's CMAP table (either by using GDI's GetFontData() or by calling VCL's UpdateFromHDC()) and then parse it using VCL's ParseCMAP() function. yanminjia->hdu: If there is no technical problem to block the implementation, Performance would be the main concern. Specially, the performance of OOo has been criticized for so many users. :) Yes, the performance impact of this enhancement would be a major concern as I already wrote in http://gsl.openoffice.org/servlets/ReadMsg?list=dev&msgNo=2276 Other than concerns like that it is probably not too difficult to implement a solution that makes almost all use cases behave better. Created attachment 62432 [details]
non-bmp chinese characters represent in dialog
Created attachment 62433 [details]
a patch for dynamic font fallback on windows
yanminjia->hdu: Many thanks for your suggestion. I implemented a class WinGlyphFallbackSubstitution in vcl/win/source/gdi/salgdi3.cxx. It can work indeed. Please see the picture I submitted. Maybe it's the first time that non-bmp Chinese characters represent in OO dialog correctly on windows.:) But the performance is impacted as expected. The source code snippet is just a naive experiment. Would you please take a look and give me more suggestions? Thank you. @yanminjia: it works and that is a good start! Here are some more hints: 1. each SalGetSubstituteFontProcExW step leaks an ImplWinFontData item 2. these ImplWinFontData items are already known in the ImplDevFontList 3. these ImplWinFontData items probably have their ImplFontCharMap already cached, so setting the font and getting the coverage over and over can be avoided 4. since the set of fonts on the system is mostly constant the expensive enumeration loop to determine the glyph coverage can be avoided for most of the fallbacks after the first one 5. for glyph-fallback we want to make sure that the fallback glyphs are valid. Some fonts claim coverage for a glyph, but provide empty ones or ones that look like the notdef glyph. Unless the font is known to be good an additonal check is probably needed. 6. the style of the fallback glyph should try to match the style of the font originally requested. E.g. the fallback for a bold oblique glyph should be bold oblique too if possible ... more later So there is a lot of work ahead of us to get this done properly. For the urgent problem you mentioned (non-BMP CJK glyphs) there is an easy workaround that gives us time: we should find out which fonts were used to resolve the urgent problematic cases; then add these fontnames to the aGlyphFallbackList[] list in vcl/source/gdi/outdev3.cxx @hdu: Many thanks to your suggestions. For your point 2 and 3, I will try another way to implement the function WinGlyphFallbackSubstitution::FindFontSubstitute by traversing ImplDevFontList as you suggested. As your point 4, 5 and 6, There is really a lot of work to enable the text output ever smarter with the dynamic font fallback. I think the goal is much clear: represent text with the most appropriate font hosted by the operating system. Created attachment 62844 [details]
Another way to implement dynamic font fallback on Windows by traversing ImplDevFontList
@hdu: I have just implemented dynamic glyph fallback with another way by traversing each ImplWinFontData item in ImplDevFontList as your suggestion. ImplDevFontList indeed contains ImplWinFontData item, but unfortunately there's no ImplFontCharMap cached in each ImplWinFontData. Would you please take a look the patch I submitted and give me more suggestions again? Thanks. @yanminjia: It looks much better, thanks! It is still very expensive though: - unfortunately the GetDevFontList() call is relatively expensive in this context. It does way to much copying. For the existing use cases it just was not worth it to do more sharing => I suggest to use the ImplDevFontList directly (knowing that all entries there are WinFontData objects) and ignoring any non- scalable entries - the first part of the loop in FindFontSubstitute() where it gets the font face details (by calling UpdateFromHDC()) is not needed when it is already known for that WinFontData object. Testing if the details are already known maybe deserves a helper function. Or it can be done by calling ImplWinFontData::GetImplFontCharMap() which has been modified a bit to also allow a NULL return @hdu: Many thanks for your suggestions. I use UpdateFromHDC() just for getting CMAP data. I think the function ReadCmapTable() may be more appropriate to do the same job, but it's a private member function of class ImplWinFontData, so can't be accessed from external calling. Anyway it's a good idea to test if ImplWinFontData object know the CAMP before reading it. Yes, using UpdateFromHDC() when we just need the codepoint coverage is fine. Almost all of its cost is the CMAP loading and parsing anyway. Avoiding it if the coverage is already available is important though. Created attachment 62929 [details]
A new implementation
@hdu: I submitted a new implementation which traverses WinFontData objects by using ImplDevFontList directly as your suggestions. And before read CMAP table, ImplWinFontData::GetImplFontCharMap is used to test if CMAP data is available. It works much better. Thank you. And now I don't think performance is still a concern. Maybe sometimes it likes that a river passes a small stone for user experience. Would you please take a look again? I didn't have time for more than a first look, but the latest patch is quite good now, thanks! I'd like some assurance from automatic testing, from performance testing (e.g. when many fonts are installed), the empty glyph problem (item 5 in desc#14), etc. Or maybe we should just start by enabling it for our CJK-Locales only in OOo3.2 and other locales later... Anyway, this change deserves its own CWS. Many thanks. I have not opened a CWS in OOo development. Would you please give me some tips? Actually I don't know what I should do next. I created the CWS gfb4win02 and will check in the latest version of your patch. Please see http://wiki.services.openoffice.org/wiki/OOo_and_Subversion to check out this CWS (ssh://svn@svn.services.openoffice.org/ooo/cws/gfb4win02) Applied in CWS gfb4win02 with a minor fixes to prevent a side effect: the availability of the unicodemap currently also means that all the interesting details from the matching HDC are known, so we should use UpdateFromHDC() instead of calling directly ReadCmapTable() I also added a big TODO: we are currently checking if the font face can resolve all missing codepoints. This approach is is problematic for some corner cases, e.g. - user types some text that can be resolved by fontA - user types another character that cannot be resolved by fontA but only by fontB => then the whole text switches from fontA to fontB, linebreaking changes etc. This scenario is already bad enough, but it starts to get really hairy when fontB cannot resolve all other codepoints And then used FindBestFontFace() instead of iterating over the font faces manually. This not only helps with finding a better stylistic match but it also avoids leaking WinGlyphFallbackSubstititution into one of the central header files of the platform-independent part of vcl. My implementation is the simplest solution which only focus on resolving missing codepoints. A better font match algorithm in accordance of style really makes sense of text output quality. Maybe a windows-specific font managing library simiar to fontconfig should be developed, but it would cost a large amount of efforts. :) The CWS is available under the branch svn://svn@svn.services.openoffice.org/ooo/cws/gfb4win02/ This issue is still on my radar but I don't know yet when I get around to testing it. In the meantime I'd like to collect some feedback from third parties. Is anyone else testing this too? @hdu:I see gfb4win02 still pending on test. I'm not sure if I can take some actions to push the testing process.:) @yanminjia: my other CWSses like graphite01, otf01, vcl104 need a lot of attention currently... @hdu: In consideration of performance and text output quality, I designed a 2- level font fallback algorithm for windows platform based on the implementation before. Pseudo-code below gives a great simple description: input: missing string output: font face //First level fall back, be fast and cover most cases determine the language of missing string get UI default font face df with language type of missing string if df contains the characters of missing string return df; //second level fall back, slow but only a few cases need for each fontface in ImplDevFontList { if fontface contains the missing string return fontface; else next; } The advantages of the above algorithm is, 1. 1st level fall back is quick, and a approriciate font can be got for the missing characters in most circumstances. 2. Though 2nd level fall back seems awkward and stupid, but useful in a very few circumenstances. For example, some non-bmp Chinese characters can be presented in controls with 2nd level fallback. several experiments show it really work well. I will submit a patch after the implementation being refined. Created attachment 66148 [details]
the implementation of 2-level font fallback algorithm
@hdu: The latest implementation totally changed the way of font fallback under windows. It really works well and be integrated into code stream of symphony development. In most scenarios, the best fontface could be got from the fallback algorithm. What's more, the font fallback algorithm can be refined by update the font configuration file vcl.xcu. Performance also can be benefited from the new implementation. Would you please take a look the patch I just deliverd? Thanks. It looks very good, thank you! I think it is ready for a OOo330 target. The patch is now committed into CWS gfb4win02. There were a few remaining problems: A. the first level fallback always defaulted to ZHS glyphs even for the unified CJK Unified Ideographs B. the second level fallback treated fonts with unknown CMAP types as match C. the fallback font depends on the first missing char which can easily result in layout instabilities e.g. when loading/reloading a document or when scrolling, selecting, zooming, etc. D. the fallback mechanism has problems with the lifetime of its device font list E. the use of OutputDevice layer methods in the SalGraphics layer may be problematic I already fixed A and B in the CWS and will look into C and D. We can live with E for now but it needs to be carefully reviewed that it doesn't introduce problems such as issue 108914. Regarding A (unified CJK ideographs) I am not yet sure if the ranges that I changed from LANGUAGE_CHINESE_SIMPLIFIED to LANGUAGE_DEFAULT_CJK are the correct ones: U+3000..U+303F, U+31C0.U+31EF, U+3400..U+4DBF, U+4E00..U+9FCF, U+F900..FAFF, U+20000..U+2A6DF and U+2F800..U+2FA1F. For the language detection in glyph fallback these ranges get mapped back to the default CJK language. @yanminjia,@khirano,@maho: Are the glyphs used in these unicode ranges locale specific or do they provide hints regarding the language (e.g. like U+3040..U+30FF indicate Japanese)? Are the ranges mentioned above sufficient or did we need more? @hdu: I'm not sure if there is a node for certain language to match with LANGUAGE_DEFAULT_CJK in vcl.xcu. if no such a node available, none font could be hit when the missing characters is in the range mapped back to LANGUAGE_DEFAULT_CJK. Actually, Few font can cover all the characters of CJK which are so many that uses the most part of Unicode encoding space. :) As known well, there is not a accurate map from language to the block of Unicode encoding space. So the map I composed is just to meet the requirement of font fallback in OOo in accordance of the config file vcl.xcu. I have made a little change. Please see the attachment I just submitted. Created attachment 68235 [details]
update of LangFromCodeChart in salgdi3.cxx
The idea with the LANGUAGE_DEFAULT_CJK entries is that their language gets dynamically replaced by the preferred CJK language. Either the UI-language or the language selected in the registry key \\\HKEY_LOCAL_MACHINE\SYSTEM\\CurrentControlSet\\Control\\Nls\\Language\\Default It would be best if the language from the Tools->Options->Languages->DefaultCJK lang could be used, but this is not yet available in the SalGraphics layer. By the way, the current version of saldi3.cxx in CWS gfb4win02 can be seen in http://hg.services.openoffice.org/cws/gfb4win02/file/08d71e4c61e4/vcl/win/source/gdi/salgdi3.cxx the mercurial service does unfortunately not provide a way to link to the tip of the CWS repository. I think your idea can help to determine the language of the unified CJK missing characters more accurately. That's great. The language of the characters in the blocks (such as U+3400..U+4DBF) you mensioned can not be known only from the value of the code point, though they are more biased to be comprised in a Chinese document. :) Thank you. @hdu: I think your idea can help to determine the language of the unified CJK missing characters more accurately. That's great. The language of the characters in the blocks (such as U+3400..U+4DBF) you mensioned can not be known only from the value of the code point, though they are more biased to be comprised in a Chinese document. :) Thank you. @hdu: I think your idea can help to determine the language of the unified CJK missing characters more accurately. That's great. The language of the characters in the blocks (such as U+3400..U+4DBF) you mensioned can not be known only from the value of the code point, though they are more biased to be comprised in a Chinese document. :) Thank you. In CWS gfb4win02 all points have been addressed. Item E still worries me a bit but as I mentioned it is more than good enough for now. @sba: please verify in CWS gfb4win02. Changed behaviour is only expected on WIN platforms. . Verified in CWS gfb4win02. Best user scenario on my machine: - New Writer doc - Insert some characters from U+3400 upwards (use i.e. font "SunBatang") - select all and set them to another font without CJK characters -> Chosen font for replacement is much better than before *** Issue 106798 has been marked as a duplicate of this issue. *** |