Apache OpenOffice (AOO) Bugzilla – Issue 42661
RFE: Thai need feature for automatic sequence input correction
Last modified: 2013-08-07 15:03:05 UTC
OOo has sequence input checking implemented for Thai (which is currently broken, see issue 42469). This is the most basic form of Thai input support on every platforms. What it does is rejecting input characters that would make illegal combining character sequences. However, for complex applications like word processors, Thai users are used to a feature I called automatic sequence input correction or sequence correction. The international version of MS Office implements this feature. Also both the Thai-specific versions of OOo (OfficeTLE and Pladao) have independently implemented this feature. It is a feature that is very convenient for a Thai user. Without this feature, OOo will not be competitive with other products for a Thai user. After accepting an input character from a user and found that appending the character would make an illegal combining character sequence, sequence correction algorithm will try to insert the character into the character sequence or delete an existing combining character to make the character sequence legal. It gives the last character priority from the assumption that the last character is what the user really want. Example in Thai :- In Thai, a diacritic or a tone mark (which are combining characters) always follows an upper/lower vowel (which are combining characters) which always follows a base consonant. That is, only this form of combining character sequence is valid - C [V] [M]. The vowel and the mark are optional. If the buffer contains C M (e.g. gor-gai + mai-ek) and a user type a V (e.g. sara-ii). The V can't be appended to the sequence 'C M' cause it would make an illegal sequence. So it is INSERTed before the M to make the legal C V M sequence (gor-gai + sara-ii + mai-ek). This make it possible for the users to type a combining character sequence in both 'C V M' order or 'C M V' order. When the user types another M (mai-toe), the M can't be appended to the C V M sequence cause it'd make C V M M. Since this means that the user are trying to correct the existing sequence, the algorithm REPLACE the existing M with the new M (gorgai + sara-ii + mai-toe). This saves the user one backspce. When the user types another V (sara-i), the V can't be appended cause it'd make C V M V. This means that the user are trying to correct the existing V. So the algorithm REPLACE the existing V with the new one (gorgai + sara i + mai-toe). This save the user two backspace and one character. The two cases above are for the users to correct existing combining character sequences without the need to backspace or retype. The users can recompose the sequences again thru automatic sequence correction. Since this happens quite often so it is a nice feature to have for any language with a lot of combining characters like Thai.
I wonder whether this is something that could be handled outside of OOo by an input method (using e.g. IIIMF).
This feature is called "Type and Replace" in Microsoft Office http://office.microsoft.com/en-us/assistance/HP030745481033.aspx
confirmed.
set target to 2.0.1
reassign
FME->all: So basically there are five rules: 1) CM + V = CVM 2) CM1 + M2 = CM2 3) CVM1 + M2 = CVM2 4) CV1M + V2 = CV2M 5) CV1 + V2 = CV2 Right?
FME->KHONG: Please add the correctInputSequence functionality to i18n.
As implemented in MSO, it's a little more complicated that this, because it also handles following vowels (sara aa, sara am, sara a). Using F for these, it also does CFM => CMF CM1FM2 => CM2F CVF => CF CFV => CV CVMF => CMF Also where L is a leading vowel, it does L1L2=>L2 Also thanthakhat (karan) is not treated the same as a tone mark, since it's only allowed with certain vowels, e.g. [gor gai][sara uu][karan] will map to [gor gai][karan] We need something similar to this: http://linux.thai.net/~thep/th-xim/#Correction There's also an interaction with normal vs restricted input sequence checking. If you're not doing "restricted" input sequence checking, then only corrections within a single cell make sense. I've reassigned this to myself to provide a proper spec.
FME: Application code (basing on KHONGs new XExtendedInputSequenceChecker interface) has been implemented in cws thaiissues: sw/inc/checkit.hxx rev. 1.2.1242.1 sw/source/core/bastyp/checkit.cxx rev. 1.3.1242.1 sw/source/core/txtnode/ndtxt.cxx rev. 1.51.70.2
After some discussions, we reached the conclusion that making sequence correction very clever (as in MSO) was actually a misfeature, because it made it hard for users to understand and predict behaviour. I'll specify this in terms of WTT character classes, but it is convenient to have a few extra classes: <abv> = <av1>|<av2>|<av3>|<bv1>|<bv2> <abv1> = <av1>|<bv1> <thanthakhat> = 0E4C (karan) Then we have 8 rules: <cons> <abv>_x + <abv>_y => <cons> <abv>_y (replace) <cons> <tone>_x + <tone>_y => <cons> <tone>_y (replace) <cons> <abv> <tone>_x + <tone>_y => <cons> <abv> <tone>_y (replace) <cons> <abv>_x <tone> + <abv>_y => <cons> <abv>_y <tone> (replace, reorder) <cons> <tone> + <abv> => <cons> <abv> <tone> (reorder) [same as fme's 5 rules so far] <cons> <fv1> + <tone> => <cons> <tone> <fv1> (reorder) <cons> <tone>_x <fv1> + <tone>_y => <cons> <tone>_y <fv1> (replace, reorder) <cons> <thanthakhat> + <abv1> => <cons> <abv1> <thanthakhat> (reorder) In any other situation, sequence correction behaves the same as sequence checking. The ideas behind this choice of rules are: - take effect only when the character typed is a combining character - apply only to sequences that both strict and basic WTT 2.0 checking disallow - allow the combining characters in a cell to be typed in any order - allow tone marks to be typed after following vowels - don't provide rules for the 4 combining character combinations for is optional in WTT 2.0 (eg. sara ii + maitokhu) - only replace like by like (tone marks by tone marks, vowels by vowels) There needs to be a UI, adding a checkbox "Type and replace" beneath the current "Restricted" checkbox. Like the "Restricted" checkbox, the "Type and replace" checkbox is enabled only if the "Use sequence checking" checkbox is enabled. "Type and replace" would be enabled by default. (This might be a good opportunity to fix issue 42967, and change "Restricted" to "Strict".)
Need one more rule: <cons> <abv1>_x <thanthakhat> + <abv1>_y => <cons> <abv1>_y <thanthakhat> (reorder, replace)
FME->KHONG: Please change your implementation to the new set of rules.
New rules are implemented and checked in.
FT: This issue also needs a UI to control the setting of this feature. The specification for this casn be found at issue 48117.
FME->OS: Please implement the new ui.
Set to fixed, UI is issue 48117
Reassigned for verification re-open issue and reassign to sba@openoffice.org
reassign to sba@openoffice.org
reset resolution to FIXED
SBA: Verified in CWS thaiissues (See issue 52055).
Shouldn't this also works in Calc, Impress, Draw and Base. Now it allow works in Writer.
SBA: OK in Master. Closed. For further input sequence fixes, see issue 54913, issue 61397 issue 61994.