Issue 42171 - Display of invalid Thai combining character sequences broken on Windows
Summary: Display of invalid Thai combining character sequences broken on Windows
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: 680m74
Hardware: PC Windows XP
: P3 Trivial (vote)
Target Milestone: AOO PleaseHelp
Assignee: AOO issues mailing list
QA Contact:
Depends on:
Blocks: 41707
  Show dependency tree
Reported: 2005-02-07 07:26 UTC by samphan
Modified: 2017-05-20 11:29 UTC (History)
7 users (show)

See Also:
Latest Confirmation in: ---
Developer Difficulty: ---

Text document with invalid Thai combining character sequences (6.23 KB, application/vnd.oasis.opendocument.text)
2005-02-07 07:27 UTC, samphan
no flags Details
Screenshot of the document displayed correctly on Linux (43.52 KB, image/jpeg)
2005-02-07 07:30 UTC, samphan
no flags Details
Screenshot of the document displayed on Windows (29.39 KB, image/jpeg)
2005-02-07 07:31 UTC, samphan
no flags Details
Screenshot of the document displayed on Windows, reformat to use Angsana (28.64 KB, image/jpeg)
2005-02-07 07:33 UTC, samphan
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description samphan 2005-02-07 07:26:30 UTC
A combining character sequence such as gor gai+mai ek+sara ii (0e01+0e48
+0e35) is not displayed properly on Windows.  It should be displayed as
gor gai with the mai ek and then dotted circle with sara ii.  It *is*
displayed in this way in Linux. On Windows, with the old Windows Thai
fonts, such as Angsana and Browalia, an ugly black box is show, and it
is not clear that there is a sara ii there. Much more seriously, with
more recent fonts such as Tahoma, the sara ii does not show up at all.

The combining character sequences that are not displayed properly are
sequences that Windows cannot display in a single cell.  Such sequences
never occur in correct Thai.  Conventionally, most applications on
Windows prevent the input of such invalid sequences.  However, OOo does
not always do this and it is anyway possible for such sequences to occur
in imported data.  It is important that such sequences be highly visible
to the user so that the user can correct them.

Test case:
1) Load the attached document (with invalid combining character sequences) on
Linux. The display use dotted circles to ensure that all combining characters in
invalid combining character sequences are clearly displayed. See the first
screenshot attached.

2) Load the same document on Windows. You'll not see any dotted-circle. See the
second screenshot. So you'll not know that this document has errors in it. 

3) Reformat the document to use the font Angsana (or Browallia or other Windows
Thai fonts). You'll see black boxes where there are invalid combining character
sequences. See the third screenshot. This let you know that there're errors but
you can't tell what the error is. Using Tahoma or Microsoft Sans Serif or Lucida
Sans Unicode (which have the glyph for dotted circle) instead, and there are no
black boxes but there are no dotted circle either.
Comment 1 samphan 2005-02-07 07:27:57 UTC
Created attachment 22278 [details]
Text document with invalid Thai combining character sequences
Comment 2 samphan 2005-02-07 07:30:40 UTC
Created attachment 22279 [details]
Screenshot of the document displayed correctly on Linux
Comment 3 samphan 2005-02-07 07:31:57 UTC
Created attachment 22280 [details]
Screenshot of the document displayed on Windows
Comment 4 samphan 2005-02-07 07:33:13 UTC
Created attachment 22281 [details]
Screenshot of the document displayed on Windows, reformat to use Angsana
Comment 5 falko.tesch 2005-02-09 16:36:39 UTC
Hi Karl, seems for some reason that the iterator is broken (only under Windows?).
Can you please check if this can bwe fixed or if this is a font specific matter
(just a wild guess, though)?. Thx in advance.
Comment 6 karl.hong 2005-02-10 22:57:48 UTC
Karl: This is not a breakiterator issue, but layout engine issue. Linux and
Window use different engines, Window uses native Uniscribe while Linux use ICU
layout engine. 

For preventing entering invalid sequence, we do have input sequence checking,
but it was broken.  

I will create a new issue to fix broken input sequence checking and transfer
this one to Herbert for fixing layout engine.
Comment 7 2005-02-15 15:56:59 UTC
Can reproduce.
Comment 8 2005-02-15 16:49:30 UTC
Unfortunately we are 100% compatible here with an important legacy application
from a major competitor, because we use the same layout engine... so the problem
is in the Uniscribe library which is outside OOo's scope.

Thanks for the great bugdocs and the excellent bug report which made reproducing
the problem easy.
Comment 9 jjc 2005-02-15 17:14:24 UTC
Thanks for looking into this issue.  So if I understand correctly, the situation
is that:

a) Uniscribe has a bug/limitation that it displays invalid combinining character
sequences poorly

b) OOo sometimes gives Uniscribe invalid combining character sequences to display

I don't think it follows from this that nothing needs changing in OOo.

For example, if the document contains 0e01+0e48+0e35, which Uniscribe cannot
display properly, the OOo display engine might transform that to
0e01+0e48+25cc+0e35 before giving it to Uniscribe to display.

Alternatively the Sequence Input Checking could be made more vigorous on Windows
so that it is impossible for the user to enter such invalid sequences (which I
believe is the case with some competitor products).

The current situation may well be Uniscribe's fault, but it is not an acceptable
situation for OOo Thai users on Windows, and I find it hard to believe that
there is nothing OOo can do to improve the situation.
Comment 10 2005-02-21 18:21:32 UTC
Ok, it is possible to workaround the issue by changing invalid sequences to
valid ones.
Comment 11 2005-02-21 18:22:37 UTC
HDU->FME: please work with Karl to convert invalid character sequences into
valid ones...
Comment 12 frank.meies 2005-02-22 07:49:24 UTC
FME->FT: And finally back to you. I think this means we should implement a "type
and replace" feature for sequence input checking, as know from a competitor. In
this case we need a more detailed desciption of the functionality of this feature.
Comment 13 frank.meies 2005-02-22 07:50:10 UTC
Comment 14 jjc 2005-02-22 08:17:27 UTC
"Type and replace" is issue 42661.  That's is a separate (although related)
issue.  "Type and replace" is about how to prevent invalid combining character
sequences getting into your document.  The issue here is what happens if your
document contains an invalid combining character sequence; that can happen when
you load a document or when you turn off sequence input checking and "type and
replace".  In order to display invalid combining character sequences with
Uniscribe, it is necessary to transform invalid combining character sequences to
sequences that can be displayed by Uniscribe (e.g. by inserting dotted circle
glyphs) as part of the display process; this wouldn't change the logical content
of the document which would still contain invalid combining character sequences.
Comment 15 samphan 2005-02-22 08:36:05 UTC
I'm wondering why Uniscribe doesn't support displaying invalid combining
character sequence. It is said here
Maybe it is implemented in every CTL languages mentioned here
Comment 16 falko.tesch 2005-10-20 20:34:56 UTC
FT: Back to you Samphan. For the moment I do not see that we can do such thing
without the help from the outside. Please provide spec and patch/code first.
please do ont assign this issue to me again since I'm leaving this position. thx
Comment 17 arthit 2008-04-23 08:01:14 UTC
any Windows user can confirmed if this still occurs in the latest OOo ?
Comment 18 Marcus 2017-05-20 11:29:19 UTC
Reset assigne to the default "".