Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Display of invalid Thai combining character sequences broken on Windows | ||
---|---|---|---|
Product: | gsl | Reporter: | samphan |
Component: | code | Assignee: | AOO issues mailing list <issues> |
Status: | CONFIRMED --- | QA Contact: | |
Severity: | Trivial | ||
Priority: | P3 | CC: | arthit, hin.stone, issues, jjc, khirano, markpeak, nusorn |
Version: | 680m74 | ||
Target Milestone: | AOO PleaseHelp | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- |
Developer Difficulty: | --- | ||
Issue Depends on: | |||
Issue Blocks: | 41707 | ||
Attachments: |
Description
samphan
2005-02-07 07:26:30 UTC
Created attachment 22278 [details]
Text document with invalid Thai combining character sequences
Created attachment 22279 [details]
Screenshot of the document displayed correctly on Linux
Created attachment 22280 [details]
Screenshot of the document displayed on Windows
Created attachment 22281 [details]
Screenshot of the document displayed on Windows, reformat to use Angsana
Hi Karl, seems for some reason that the iterator is broken (only under Windows?). Can you please check if this can bwe fixed or if this is a font specific matter (just a wild guess, though)?. Thx in advance. Karl: This is not a breakiterator issue, but layout engine issue. Linux and Window use different engines, Window uses native Uniscribe while Linux use ICU layout engine. For preventing entering invalid sequence, we do have input sequence checking, but it was broken. I will create a new issue to fix broken input sequence checking and transfer this one to Herbert for fixing layout engine. Can reproduce. Unfortunately we are 100% compatible here with an important legacy application from a major competitor, because we use the same layout engine... so the problem is in the Uniscribe library which is outside OOo's scope. Thanks for the great bugdocs and the excellent bug report which made reproducing the problem easy. Thanks for looking into this issue. So if I understand correctly, the situation is that: a) Uniscribe has a bug/limitation that it displays invalid combinining character sequences poorly b) OOo sometimes gives Uniscribe invalid combining character sequences to display I don't think it follows from this that nothing needs changing in OOo. For example, if the document contains 0e01+0e48+0e35, which Uniscribe cannot display properly, the OOo display engine might transform that to 0e01+0e48+25cc+0e35 before giving it to Uniscribe to display. Alternatively the Sequence Input Checking could be made more vigorous on Windows so that it is impossible for the user to enter such invalid sequences (which I believe is the case with some competitor products). The current situation may well be Uniscribe's fault, but it is not an acceptable situation for OOo Thai users on Windows, and I find it hard to believe that there is nothing OOo can do to improve the situation. Ok, it is possible to workaround the issue by changing invalid sequences to valid ones. HDU->FME: please work with Karl to convert invalid character sequences into valid ones... FME->FT: And finally back to you. I think this means we should implement a "type and replace" feature for sequence input checking, as know from a competitor. In this case we need a more detailed desciption of the functionality of this feature. . "Type and replace" is issue 42661. That's is a separate (although related) issue. "Type and replace" is about how to prevent invalid combining character sequences getting into your document. The issue here is what happens if your document contains an invalid combining character sequence; that can happen when you load a document or when you turn off sequence input checking and "type and replace". In order to display invalid combining character sequences with Uniscribe, it is necessary to transform invalid combining character sequences to sequences that can be displayed by Uniscribe (e.g. by inserting dotted circle glyphs) as part of the display process; this wouldn't change the logical content of the document which would still contain invalid combining character sequences. I'm wondering why Uniscribe doesn't support displaying invalid combining character sequence. It is said here http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb and http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid and http://www.microsoft.com/typography/OpenType%20Dev/lao/shaping.mspx#invalid Maybe it is implemented in every CTL languages mentioned here http://www.microsoft.com/typography/SpecificationsOverview.mspx FT: Back to you Samphan. For the moment I do not see that we can do such thing without the help from the outside. Please provide spec and patch/code first. please do ont assign this issue to me again since I'm leaving this position. thx any Windows user can confirmed if this still occurs in the latest OOo ? Reset assigne to the default "issues@openoffice.apache.org". |