Issue 102623 - Character set conversion doesn't work correctly for field and table names using ODBC
Summary: Character set conversion doesn't work correctly for field and table names usi...
Status: UNCONFIRMED
Alias: None
Product: Base
Classification: Application
Component: code (show other issues)
Version: OOO300m9
Hardware: PC Windows XP
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-09 17:22 UTC by ludob
Modified: 2013-01-29 21:47 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description ludob 2009-06-09 17:22:17 UTC
When selecting in Database properties / Additional settings / Data conversion /
Character Set the value Unicode(utf8), field and columns names are not displayed
or used correctly everywhere when non ascii characters are present in the names.
The errors: 
  - column headers in the Table Data View are not correctly displayed. Non ascii
characters are replaced with a "white question mark in black square" or other
symbol. The fields listed when editing the table structure are correct however.
Table was created using originale database client software (Oracle web client,
Mysql Query Browser,...)   
  - create a new table in design view with field names containing non-ascii
characters. The Table Data View will display the field name correctly but when
looking at the database with the database vendor's tools, the field name was
created with every non ascii character replaced by 2 other characters. The
fields listed when editing the table structure correspond with what is seen on
the database.
  - create a new table with a table name containing non-ascii characters. The
table will be created with wrong characters in it's name.

This behavior is noticed with ODBC drivers from Oracle and MySQL. I'm also
developping a special ODBC driver and I've noticed that for example in case 2,
the driver will receive a call to SQLExecDirectW with a wide string "Create
table ..." containing in the field name the characters é instead of the
expected character é. Since é is the byte representation of é in UTF8, I have
to conclude that the UTF8 code is encoded by a single byte codepage (system?) to
unicode conversion. When I looked at the character codes sent and received in
the other test cases, the conclusion was similar: a wrong encoding/decoding is
used: probably system-unicode instead of utf8-unicode.

Same problem is noticed in 2.4.1.

Ludo Brands
Comment 1 ludob 2009-06-10 09:28:28 UTC
same behavior in OOO310m11
Comment 2 ludob 2009-06-10 19:41:48 UTC
same behavior in OOO310m11
Comment 3 r4zoli 2010-03-10 12:27:01 UTC
@ Ludo This issue still in OOo 3.2?
Comment 4 ludob 2010-03-10 16:53:30 UTC
Tested on OOO320m12 and the problem is still there. Used the MyQL 5.1 driver and
my own driver to test.
Created a table named "test" with a field named "tést". Open table and insert
row -> error: unknown column 'tÃf st' in field list. The odbc log shows a
SQLPREPARE with the following query "INSERT INTO `test` (`ID`,
`t\ff\ff\ff\ffst`) VALUES ...) . Clearly the é was translated twice with
probably system->utf8 to become 4 bytes. According to the same log the table was
created with the field name `t\ff\ffst` which is OK (windows ODBC manager
doesn't know utf8 and logs all none ascii characters as \ff).    
Comment 5 ludob 2010-03-10 17:18:35 UTC
I had a further look into this. Part of the problem is the windows ODBC manager.
All drivers tested support wide character ODBC function calls (SQLExecuteW, etc)
and the single byte character ones (SQLExecute, etc). The windows driver manager
sits between OOO and the driver. OOO uses the single byte version of the ODBC
function calls and the windows driver manager translates the single byte
function calls to wide character ones. It translates also the parameters from
system to utf16 in doing so (é in UTF8 is two bytes, translating with system to
utf16 gives two wide characters). The drivers continue with the utf16 values and
will use wrong field names. This explains the first 2 errors reported in the
original issue. I have to conclude that this is not an OOO issue. 
The test case in my earlier comment today does however still show a problem with
a double translation in OOO.