Issue 42661 - RFE: Thai need feature for automatic sequence input correction
Summary: RFE: Thai need feature for automatic sequence input correction
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: 680m79
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@l10n
URL:
Keywords: oooqa
Depends on:
Blocks: 48117
  Show dependency tree
 
Reported: 2005-02-13 05:26 UTC by samphan
Modified: 2013-08-07 15:03 UTC (History)
6 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description samphan 2005-02-13 05:26:09 UTC
OOo has sequence input checking implemented for Thai (which is currently broken,
see issue 42469). This is the most basic form of Thai input support on every
platforms. What it does is rejecting input characters that would make illegal
combining character sequences. However, for complex applications like word
processors, Thai users are used to a feature I called automatic sequence input
correction or sequence correction. The international version of MS Office
implements this feature.  Also both the Thai-specific versions of OOo (OfficeTLE
and Pladao) have independently implemented this feature. It is a feature that is
very convenient for a Thai user. Without this feature, OOo will not be
competitive with other products for a Thai user.

After accepting an input character from a user and found that appending the
character would make an illegal combining character sequence, sequence
correction algorithm will try to insert the character into the character
sequence or delete an existing combining character to make the character
sequence legal. It gives the last character priority from the assumption that
the last character is what the user really want.

Example in Thai :-

In Thai, a diacritic or a tone mark (which are combining characters) always
follows an upper/lower vowel (which are combining characters) which always
follows a base consonant. That is, only this form of combining character
sequence is valid - C [V] [M]. The vowel and the mark are optional.
If the buffer contains C M (e.g. gor-gai + mai-ek) and a user type a V (e.g.
sara-ii). The V can't be appended to the sequence 'C M' cause it would make an
illegal sequence. So it is INSERTed before the M to make the legal C V M
sequence (gor-gai + sara-ii + mai-ek).
	This make it possible for the users to type a combining character sequence in
both 'C V M' order or 'C M V' order.

When the user types another M (mai-toe), the M can't be appended to the C V M
sequence cause it'd make C V M M. Since this means that the user are trying to
correct the existing sequence, the algorithm REPLACE the existing M with the new
M (gorgai + sara-ii + mai-toe). This saves the user one backspce.
When the user types another V (sara-i), the V can't be appended cause it'd make
C V M V. This  means that the user are trying to correct the existing V. So the
algorithm REPLACE the existing V with the new one (gorgai + sara i + mai-toe).
This save the user two backspace and one character.
	The two cases above are for the users to correct existing combining character
sequences without the need to backspace or retype. The users can recompose the
sequences again thru automatic sequence correction. Since this happens quite
often so it is a nice feature to have for any language with a lot of combining
characters like Thai.
Comment 1 jjc 2005-02-18 04:12:46 UTC
I wonder whether this is something that could be handled outside of OOo by an
input method (using e.g. IIIMF).
Comment 2 arthit 2005-02-20 19:59:27 UTC
This feature is called "Type and Replace"
in Microsoft Office

http://office.microsoft.com/en-us/assistance/HP030745481033.aspx
Comment 3 arthit 2005-02-22 11:37:34 UTC
confirmed.
Comment 4 Martin Hollmichel 2005-05-11 16:37:34 UTC
set target to 2.0.1
Comment 5 Martin Hollmichel 2005-08-25 08:16:57 UTC
reassign
Comment 6 frank.meies 2005-08-25 10:44:46 UTC
FME->all: So basically there are five rules:

1) CM + V = CVM
2) CM1 + M2 = CM2
3) CVM1 + M2 = CVM2
4) CV1M + V2 = CV2M
5) CV1 + V2 = CV2

Right?
Comment 7 frank.meies 2005-08-25 11:57:02 UTC
FME->KHONG: Please add the correctInputSequence functionality to i18n.
Comment 8 jjc 2005-08-26 03:27:57 UTC
As implemented in MSO, it's a little more complicated that this, because it also
handles following vowels (sara aa, sara am, sara a).  Using F for these, it also
does

  CFM => CMF
  CM1FM2 => CM2F
  CVF => CF
  CFV => CV
  CVMF => CMF

Also where L is a leading vowel, it does

  L1L2=>L2

Also thanthakhat (karan) is not treated the same as a tone mark, since it's only
allowed with certain vowels, e.g. [gor gai][sara uu][karan] will map to [gor
gai][karan]

We need something similar to this:

  http://linux.thai.net/~thep/th-xim/#Correction

There's also an interaction with normal vs restricted input sequence checking.
If you're not doing "restricted" input sequence checking, then only corrections
within a single cell make sense.

I've reassigned this to myself to provide a proper spec.

Comment 9 frank.meies 2005-08-31 09:48:00 UTC
FME: Application code (basing on KHONGs new XExtendedInputSequenceChecker
interface) has been implemented in cws thaiissues:

sw/inc/checkit.hxx rev. 1.2.1242.1
sw/source/core/bastyp/checkit.cxx rev. 1.3.1242.1
sw/source/core/txtnode/ndtxt.cxx rev. 1.51.70.2

Comment 10 jjc 2005-09-15 18:41:23 UTC
After some discussions, we reached the conclusion that making sequence
correction very clever (as in MSO) was actually a misfeature, because it made it
hard for users to understand and predict behaviour.

I'll specify this in terms of WTT character classes, but it is convenient to
have a few extra classes:

<abv> = <av1>|<av2>|<av3>|<bv1>|<bv2>
<abv1> = <av1>|<bv1>
<thanthakhat> = 0E4C (karan)

Then we have 8 rules:

<cons> <abv>_x + <abv>_y => <cons> <abv>_y (replace)
<cons> <tone>_x + <tone>_y => <cons> <tone>_y (replace)
<cons> <abv> <tone>_x + <tone>_y => <cons> <abv> <tone>_y (replace)
<cons> <abv>_x <tone> + <abv>_y => <cons> <abv>_y <tone> (replace, reorder)
<cons> <tone> + <abv> => <cons> <abv> <tone> (reorder)
[same as fme's 5 rules so far]
<cons> <fv1> + <tone> => <cons> <tone> <fv1> (reorder)
<cons> <tone>_x <fv1> + <tone>_y => <cons> <tone>_y <fv1> (replace, reorder)
<cons> <thanthakhat> + <abv1> => <cons> <abv1> <thanthakhat> (reorder)

In any other situation, sequence correction behaves the same as sequence checking.

The ideas behind this choice of rules are:

- take effect only when the character typed is a combining character
- apply only to sequences that both strict and basic WTT 2.0 checking disallow
- allow the combining characters in a cell to be typed in any order
- allow tone marks to be typed after following vowels 
- don't provide rules for the 4 combining character combinations for is optional
in WTT 2.0 (eg. sara ii + maitokhu)
- only replace like by like (tone marks by tone marks, vowels by vowels)

There needs to be a UI, adding a checkbox "Type and replace" beneath the current
"Restricted" checkbox.  Like the "Restricted" checkbox, the "Type and replace"
checkbox is enabled only if the "Use sequence checking" checkbox is enabled.
"Type and replace" would be enabled by default. (This might be a good
opportunity to fix issue 42967, and change "Restricted" to "Strict".)

Comment 11 jjc 2005-09-16 04:28:35 UTC
Need one more rule:

<cons> <abv1>_x <thanthakhat> + <abv1>_y => <cons> <abv1>_y <thanthakhat>
(reorder, replace)
Comment 12 frank.meies 2005-09-16 06:56:39 UTC
FME->KHONG: Please change your implementation to the new set of rules.
Comment 13 karl.hong 2005-09-19 23:53:03 UTC
New rules are implemented and checked in.
Comment 14 falko.tesch 2005-09-21 06:27:53 UTC
FT: This issue also needs a UI to control the setting of this feature. The
specification for this casn be found at issue 48117.
Comment 15 frank.meies 2005-09-21 11:42:00 UTC
FME->OS: Please implement the new ui.
Comment 16 Oliver Specht 2005-10-13 14:53:44 UTC
Set to fixed, UI is issue 48117
Comment 17 Oliver Specht 2005-10-31 11:17:08 UTC
Reassigned for verification

re-open issue and reassign to sba@openoffice.org
Comment 18 Oliver Specht 2005-10-31 11:17:15 UTC
reassign to sba@openoffice.org
Comment 19 Oliver Specht 2005-10-31 11:17:23 UTC
reset resolution to FIXED
Comment 20 stefan.baltzer 2005-11-10 17:18:59 UTC
SBA: Verified in CWS thaiissues (See issue 52055).
Comment 21 samphan 2006-01-26 09:46:10 UTC
Shouldn't this also works in Calc, Impress, Draw and Base. Now it allow works in
Writer.
Comment 22 stefan.baltzer 2006-03-14 12:05:08 UTC
SBA: OK in Master. Closed.
For further input sequence fixes, see issue 54913, issue 61397 issue 61994.