Apache OpenOffice (AOO) Bugzilla – Issue 102623
Character set conversion doesn't work correctly for field and table names using ODBC
Last modified: 2013-01-29 21:47:10 UTC
When selecting in Database properties / Additional settings / Data conversion / Character Set the value Unicode(utf8), field and columns names are not displayed or used correctly everywhere when non ascii characters are present in the names. The errors: - column headers in the Table Data View are not correctly displayed. Non ascii characters are replaced with a "white question mark in black square" or other symbol. The fields listed when editing the table structure are correct however. Table was created using originale database client software (Oracle web client, Mysql Query Browser,...) - create a new table in design view with field names containing non-ascii characters. The Table Data View will display the field name correctly but when looking at the database with the database vendor's tools, the field name was created with every non ascii character replaced by 2 other characters. The fields listed when editing the table structure correspond with what is seen on the database. - create a new table with a table name containing non-ascii characters. The table will be created with wrong characters in it's name. This behavior is noticed with ODBC drivers from Oracle and MySQL. I'm also developping a special ODBC driver and I've noticed that for example in case 2, the driver will receive a call to SQLExecDirectW with a wide string "Create table ..." containing in the field name the characters é instead of the expected character é. Since é is the byte representation of é in UTF8, I have to conclude that the UTF8 code is encoded by a single byte codepage (system?) to unicode conversion. When I looked at the character codes sent and received in the other test cases, the conclusion was similar: a wrong encoding/decoding is used: probably system-unicode instead of utf8-unicode. Same problem is noticed in 2.4.1. Ludo Brands
same behavior in OOO310m11
@ Ludo This issue still in OOo 3.2?
Tested on OOO320m12 and the problem is still there. Used the MyQL 5.1 driver and my own driver to test. Created a table named "test" with a field named "tést". Open table and insert row -> error: unknown column 'tÃf st' in field list. The odbc log shows a SQLPREPARE with the following query "INSERT INTO `test` (`ID`, `t\ff\ff\ff\ffst`) VALUES ...) . Clearly the é was translated twice with probably system->utf8 to become 4 bytes. According to the same log the table was created with the field name `t\ff\ffst` which is OK (windows ODBC manager doesn't know utf8 and logs all none ascii characters as \ff).
I had a further look into this. Part of the problem is the windows ODBC manager. All drivers tested support wide character ODBC function calls (SQLExecuteW, etc) and the single byte character ones (SQLExecute, etc). The windows driver manager sits between OOO and the driver. OOO uses the single byte version of the ODBC function calls and the windows driver manager translates the single byte function calls to wide character ones. It translates also the parameters from system to utf16 in doing so (é in UTF8 is two bytes, translating with system to utf16 gives two wide characters). The drivers continue with the utf16 values and will use wrong field names. This explains the first 2 errors reported in the original issue. I have to conclude that this is not an OOO issue. The test case in my earlier comment today does however still show a problem with a double translation in OOO.