Apache OpenOffice (AOO) Bugzilla – Issue 95628
UserLayer location defaults to boot drive if username contains non ansi characters
Last modified: 2017-05-20 10:55:25 UTC
Full, lengthy Reproduction - Use Windows XP - Localize it to some western locale (German, en_US, whatever) - Create a user with a name that contains Asian or Cyrillic characters - Install the office as administrator (western locale) - Switch to the Asian/Cyrillic user - Start the office -> The user layer will be created in the root of the Windows boot drive -> All users with strange names share the user layer (locking applies) -> No installation of extensions is possible (Exception: Operating on read-only context) As discussed with HRO this scenario is a rare one. Actually it might only apply to e.g. foreign workers who insist on having their "un-ascii-fied" name as login name. The workaround is to install the language pack for the foreign user. The root cause for the issue is that some parts of the office code still uses ANSI strings (hope i got that right) where as most of the office core already is unicode only. So if a character outside the ANSI table is used we end up not being able to correctly handle the %USERCONFIG% from the windows registry. A short term solution would be to warn and abort if the user directory could not be determined so that we do not write to the boot drive. The long term solution would be to make the entire office unicode safe which is definitely out of scope for 3.1. HRO: Please evaluate the effort and re-target the issue to whatever you find appropriate
Accepted
This issue does not have an actual usecase. Retargeted.
*** Issue 95753 has been marked as a duplicate of this issue. ***
This is not rare scenario outside of Europe&USA (10 votes for 95753) - it is very common for home users and small office workers with no dedicated IT person to use their native language for usernames. Apparently users with non-lating scripts are second-class citizens for OO developers, as opposed to, for example Linux and Microsoft developers, who do recognise real-world usage cases and are not policying their customers into ASCII(32)-ASCII(127).
Even though i disapprove of kpalagins emotional outburst i agree that we should not push the issue too far into the future. My reasoning is that we apparently do some unnecessary and potentially harmful string conversions based on the locale we are running on. So we have some outdated code which should be brought up to current standards and a scenario where the issue actually surfaces.
Reassign
Joerg, is 3.2 a real target or just a way to keep issue in sight? WBR, KP.
That depends... For now this is a real target and treated as such. However, OOo release guys might still ditch it in favor of other things. Please do not underestimate the scope of this issue, this is not a trivial fix.
Thanks a lot for your response! WBR, KP.
I think this bug should not have been there when the program was released. It is the only program I have that behaves in this way. It took me several days to find the workaround and I nearly quit ooo. The bug makes ooo look "cheap" when you can't use basic spellcheck. At the ooo org forum there are several issues every week about non-working spellcheck. I suspect many complains are caused by this bug. It means ooo loose users each week and get bad reputation. If it is not possible to fix the bug fast, you should at the very least spread information about it. The only place you can find information and how to workaround, is here, deep buried in the issues. A read me document in the installation and well written tutorials at the big foras is a must. All the best!
hdu: Please take over. If I understand things right, this is something in the Windows part of SAL.
Herbert, are we on track for 3.2 with this issue? I suggest commiting the fix early in order to find possible bugs before RC stage. Regards, K. Palagin.
This has nothing to do with the windows-specific part of sal, but is a general cross-platform problem, that those code parts like to use unrelated encodings for converting from 8bit to unicode. For the issue reported here the use of the thread specific encoding is exactly the problem. E.g. convertToFileUrl() used by registercomponent.cxx does its conversion based on osl_getThreadTextEncoding(). That and most otheris sounds like a bad idea; are suspicious (http://svn.services.openoffice.org/opengrok/search? q=osl_getThreadTextEncoding&project=%2FCurrent+%28trunk%29) @sb: reassigning to you since all other related devs (jsc/jbu/obo/hro) are no longer responsible for anything in that area
@hdu: OOo internally keeps file URLs in UTF-8 and, when interfacing with external entities, converts relevant data between osl_getThreadTextEncoding and UTF-8. This is a very old design decision that directly impacts large bodies of code. Of course, if the relevant data contains characters that cannot be represented in osl_getThreadTextEncoding, this causes problems. This appears to be a problem mainly on Windows (where the system uses UTF-16, i.e., full Unicode, but osl_getThreadTextEncoding is always one of the legacy 8-bit character encodings, each only covering a small subset of Unicode). However, for Windows, a solution could be as follows: Use the ...W instead of the ...A variants of relevant system calls (so that, e.g., Asian characters in a user name are transported correctly regardless of locale). Depending on surrounding code, either directly translate between the Windows UTF-16 data and OOo's UTF-16--based rtl::OUString or, in a two-step process, translate in one step between the Windows UTF-16 data and temporary UTF-8 data and in another step between the temporary UTF-8 data and the OOo data representation via a new osl_getThreadTextEncoding'. That osl_getThreadTextEncoding' would be like osl_getThreadTextEncoding, but for Windows would always return UTF-8 (all places that currently call osl_getThreadTextEncoding would have to be checked whether or not they have to be switched to osl_getThreadTextEncoding').
Adjusting target to priority unless someone with that itch to scratch provides the patch. All the original developers of that area fled in disgust...
> OOo internally keeps file URLs in UTF-8 and, when interfacing with external entities, converts relevant data This comment from #desc 15 is wishful thinking, as this would mean that osl_getThreadTextEncoding() based conversions are only in the code paths just above the system calls. The generous sprinkling of osl_getThreadTextEncoding() all over the code base base wide outside the sal-layers shows that this is obviously not so. Especially convertToFileUrl() (in cpputools/source/registercomponent/registercomponent.cxx) proves that the optimistic assumption in #desc 15 was wrong.
@hdu: In tags/DEV300_m46/cpputools/source/registercomponent/registercomponent.cxx it is not convertToFileUrl that uses osl_getThreadTextEncoding, but rather functions like parseOptions that take external data (main's char**argv) and convert it to OOo's internal UTF-16. Maybe it makes sense to shift this discussion to a mailing list, or even to face-to-face.
Ooo 3.2.1 on Windows XP XP3 es-AR and the problem is still there. Will be this fixed some some day soon?
Reset assigne to the default "issues@openoffice.apache.org".