125469 – Japanese input not working under Japanese OS: text from IME is never inserted

Issue 125469 - Japanese input not working under Japanese OS: text from IME is never inserted

Summary: Japanese input not working under Japanese OS: text from IME is never inserted

Status:	UNCONFIRMED

Alias:	None

Product:	General
Classification:	Code
Component:	ui (show other issues)
Version:	4.1.0
Hardware:	PC OS/2

Importance:	P3 Major (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:	126518
	Show dependency tree

Reported:	2014-08-20 13:43 UTC by Alex Taylor
Modified:	2019-05-19 11:43 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Alex Taylor 2014-08-20 13:43:41 UTC

Using 4.1.1 OS/2 rc3 under DBCS OS version (Japanese). The following has been tested and confirmed with both Text and Spreadsheet documents.

The input of Japanese text is effectively impossible through the system IME (front-end processor) - either using the default "Writing Heads" mode, or the simplified "kana-kanji conversion" mode. 

Steps to reproduce as follows:
 1. Open or create an OpenOffice document and give focus to the document area.
 2. Activate kanji input conversion (Alt+` on a US keyboard).
 3. Type characters. The IME entry box appears in the bottom left of the screen (instead of at the cursor, as is normal desired behaviour) with the entered text showing as it is converted (as applicable) to Japanese characters.
 4. Press Enter to accept the converted text and insert into the document.

Actual result:
The IME entry box disappears, but the converted text is not inserted into the document; it is simply lost.

Expected result:
The IME entry box disappears, and the converted text should be inserted into the document at the cursor position.

Additional information:
Input of normal (SBCS) keyboard text works normally.

This problem is also confirmed to occur with versions 2.4 and 3.2 for OS/2.

Comment 1 Alex Taylor 2014-08-24 01:19:11 UTC

This is conjecture, but it may be that the WM_QUERYCONVERTPOS message is not being handled correctly (or at all?) by OpenOffice?

A similar issue with the mis-positioned IME input box in Mozilla is discussed (and a patch provided) here: https://bugzilla.mozilla.org/show_bug.cgi?id=684487
(This does not cover the problem of input not working at all, but the underlying cause might be related...?)

Comment 2 Alex Taylor 2019-03-28 15:41:07 UTC

I've been working on my own IME (https://github.com/altsan/os2-wnnim) which has allowed me to do detailed testing of how DBCS input actually works, and how AOO is receiving the messages.

When inserting a character via WM_CHAR, the first USHORT of mp2 is the character code. However, in the event of a DBCS character (for the current codepage), _both_ bytes are passed in the same USHORT.  This is how my IME (and others) send the character value:

        usChar = (USHORT) pszBuffer[ i ];
        if ( IsDBCSLeadByte( usChar, global.dbcs ))
            usChar |= pszBuffer[ ++i ] << 0x8;
        WinSendMsg( hwndSource, WM_CHAR,
                    MPFROMSH2CH( KC_CHAR, 1, 0 ),
                    MPFROM2SHORT( usChar, 0 ));

So the first DBCS byte is in the high-order byte of the USHORT, and the second is in the low-order byte.  Standard OS/2 PM behaviour is to simply separate out the two bytes and either combine them into a DBCS character, or treat them as two individual characters, depending on the current codepage.  So, for example, passing byte value 0x82A0 in the WM_CHAR message to (say) E.EXE, it will render as "あ" if running under codepage 932, or as "éá" under codepage 850.

I had a look at main/vcl/os2/source/window/salframe.cxx and I think I see the problem.  The function ImplConvertKey() is casting mp2 to UCHAR and thus losing the first byte.

        UCHAR nCharCode = (UCHAR)SHORT1FROMMP( aMP2 );

Now, normally I could work around this the way I do for some other applications (like MED) which do the same thing.  The usual workaround is to simply send both bytes as separate WM_CHAR messages (i.e. 0x82 then 0xA0).  However, this won't work for AOO because it converts each byte to a Unicode character value, instead of combining them into a single character first.  Also, a workaround like that would only work for my IME program, not for the standard OS/2 IME (which is the original subject of this ticket).

It seems to me that the solution in AOO is to adjust ImplConvertKey() so that it detects a high-order byte and treats double-byte characters as such.  (It might be as simple as casting aMP2 to USHORT instead of UCHAR, but that depends on how sal_Char and OUString are defined and how gsl_getSystemTextEncoding() works -- I wasn't able to trace the code that far.)

That should allow both my new IME and the standard OS/2 one to work properly.  (The position of the IME entry box would still be wonky as long as WM_QUERYCONVERTPOS is not handled, but that's mainly a cosmetic problem.)

An alternative (or additional) approach, which would only work for applications/hooks that are aware of it, would be to implement a new message like WM_UNICHAR, and allow a UCS-2 code to be passed in the MPARAMs directly.  That might be a nice feature to allow IMEs to input Unicode directly, which would really enhance AOO for OS/2. :)

Comment 3 Alex Taylor 2019-04-15 16:26:43 UTC

OK, at a minimum it looks as if the following changes are needed.  [Line numbers] are from rev 1840571.

main/vcl/os2/source/window/salframe.cxx [2743] (ImplConvertKey):

- Change variable nCharCode from UCHAR to USHORT, and remove cast to UCHAR ahead of SHORT1FROMMP( aMP2 ).


main/vcl/os2/source/window/salframe.cxx [2707-2715] (ImplGetCharCode):

- Change parameter type of nCharCode from sal_Char to USHORT.
- If high-byte of nCharCode is 0, call 
    return OUString( (sal_Char *)&nCharCode, 1, gsl_getSystemTextEncoding()).toChar();
- Else declare a 2-byte sal_Char buffer and pass that to OUString as parameter 1, e.g.:
    sal_Char nChars[2];
    nChars[0] = HIBYTE( nCharCode );
    nChars[1] = LOBYTE( nCharCode );
    return OUString( nChars, 2, gsl_getSystemTextEncoding()).toChar();

It's possible that might be sufficient (although, again, I acknowledge that more might be going on under the hood that I can't see).

Comment 4 Matthias Seidel 2019-05-19 11:43:16 UTC

Very interesting, I saw your presentation at Warpstock EU yesterday.

Can you please coordinate with Yuri Dario?

I think OS/2 is the only system using DBCS, so this would be an improvement and wouldn't "hurt" the other platforms.