Issue 95628 - UserLayer location defaults to boot drive if username contains non ansi characters
Summary: UserLayer location defaults to boot drive if username contains non ansi chara...
Status: ACCEPTED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: DEV300m34
Hardware: All Windows, all
: P3 Trivial with 17 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: needhelp
: 95753 (view as issue list)
Depends on:
Blocks:
 
Reported: 2008-10-30 09:51 UTC by joerg.skottke
Modified: 2017-05-20 10:55 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description joerg.skottke 2008-10-30 09:51:16 UTC
Full, lengthy Reproduction
- Use Windows XP
- Localize it to some western locale (German, en_US, whatever)
- Create a user with a name that contains Asian or Cyrillic characters
- Install the office as administrator (western locale)
- Switch to the Asian/Cyrillic user
- Start the office
-> The user layer will be created in the root of the Windows boot drive
-> All users with strange names share the user layer (locking applies)
-> No installation of extensions is possible (Exception: Operating on read-only
context)

As discussed with HRO this scenario is a rare one. Actually it might only apply
to e.g. foreign workers who insist on having their "un-ascii-fied" name as login
name.

The workaround is to install the language pack for the foreign user.

The root cause for the issue is that some parts of the office code still uses
ANSI strings (hope i got that right) where as most of the office core already is
unicode only. So if a character outside the ANSI table is used we end up not
being able to correctly handle the %USERCONFIG% from the windows registry.

A short term solution would be to warn and abort if the user directory could not
be determined so that we do not write to the boot drive. The long term solution
would be to make the entire office unicode safe which is definitely out of scope
for 3.1.

HRO: Please evaluate the effort and re-target the issue to whatever you find
appropriate
Comment 1 hennes.rohling 2008-11-14 16:37:10 UTC
Accepted
Comment 2 hennes.rohling 2009-01-21 10:21:59 UTC
This issue does not have an actual usecase. Retargeted.
Comment 3 stefan.baltzer 2009-02-09 16:58:35 UTC
*** Issue 95753 has been marked as a duplicate of this issue. ***
Comment 4 kpalagin 2009-02-09 20:39:22 UTC
This is not rare scenario outside of Europe&USA (10 votes for 95753) - it is 
very common for home users and small office workers with no dedicated IT 
person to use their native language for usernames.
Apparently users with non-lating scripts are second-class citizens for OO 
developers, as opposed to, for example Linux and Microsoft developers, who do 
recognise real-world usage cases and are not policying their customers into 
ASCII(32)-ASCII(127).
Comment 5 joerg.skottke 2009-02-10 08:51:45 UTC
Even though i disapprove of kpalagins emotional outburst i agree that we should
not push the issue too far into the future. 

My reasoning is that we apparently do some unnecessary and potentially harmful
string conversions based on the locale we are running on.
So we have some outdated code which should be brought up to current standards
and a scenario where the issue actually surfaces.
Comment 6 joerg.skottke 2009-02-10 11:32:39 UTC
Reassign
Comment 7 kpalagin 2009-02-10 18:41:24 UTC
Joerg,
is 3.2 a real target or just a way to keep issue in sight?

WBR,
KP.
Comment 8 joerg.skottke 2009-02-12 07:29:14 UTC
That depends... For now this is a real target and treated as such.
However, OOo release guys might still ditch it in favor of other things.
Please do not underestimate the scope of this issue, this is not a trivial fix.
Comment 9 kpalagin 2009-02-12 07:57:40 UTC
Thanks a lot for your response!
WBR,
KP.
Comment 10 adlerbeth 2009-03-10 06:46:05 UTC
I think this bug should not have been there when the program was released. It is
the only program I have that behaves in this way. It took me several days to
find the workaround and I nearly quit ooo. The bug makes ooo look "cheap" when
you can't use basic spellcheck. At the ooo org forum there are  several issues
every week about non-working spellcheck. I suspect many complains are caused by
this bug. It means ooo loose users each week and get bad reputation. If it is
not possible to fix the bug fast, you should at the very least spread
information about it. The only place you can find information and how to
workaround, is here, deep buried in the issues. A read me document in the
installation and well written tutorials at the big foras is a must. 
All the best!
Comment 11 kai.sommerfeld 2009-04-02 12:30:08 UTC
hdu: Please take over. If I understand things right, this is something in the
Windows part of SAL.
Comment 12 kpalagin 2009-04-22 09:10:10 UTC
Herbert,
are we on track for 3.2 with this issue? I
suggest commiting the fix early in order to find possible bugs before RC stage.

Regards,
K. Palagin.
Comment 13 hdu@apache.org 2009-04-22 09:49:45 UTC
This has nothing to do with the windows-specific part of sal, but is a general cross-platform problem, 
that those code parts like to use unrelated encodings for converting from 8bit to unicode. For the issue 
reported here the use of the thread specific encoding is exactly the problem.
E.g. convertToFileUrl() used by registercomponent.cxx does its conversion based on osl_getThreadTextEncoding(). That and most otheris sounds like a bad idea;  are suspicious 
(http://svn.services.openoffice.org/opengrok/search?
q=osl_getThreadTextEncoding&project=%2FCurrent+%28trunk%29)
@sb: reassigning to you since all other related devs (jsc/jbu/obo/hro) are no longer responsible for 
anything in that area
Comment 14 Stephan Bergmann 2009-04-22 13:47:46 UTC
@hdu:  OOo internally keeps file URLs in UTF-8 and, when interfacing with
external entities, converts relevant data between osl_getThreadTextEncoding and
UTF-8.  This is a very old design decision that directly impacts large bodies of
code.  Of course, if the relevant data contains characters that cannot be
represented in osl_getThreadTextEncoding, this causes problems.  This appears to
be a problem mainly on Windows (where the system uses UTF-16, i.e., full
Unicode, but osl_getThreadTextEncoding is always one of the legacy 8-bit
character encodings, each only covering a small subset of Unicode).  However,
for Windows, a solution could be as follows:

Use the ...W instead of the ...A variants of relevant system calls (so that,
e.g., Asian characters in a user name are transported correctly regardless of
locale).  Depending on surrounding code, either directly translate between the
Windows UTF-16 data and OOo's UTF-16--based rtl::OUString or, in a two-step
process, translate in one step between the Windows UTF-16 data and temporary
UTF-8 data and in another step between the temporary UTF-8 data and the OOo data
representation via a new osl_getThreadTextEncoding'.  That
osl_getThreadTextEncoding' would be like osl_getThreadTextEncoding, but for
Windows would always return UTF-8 (all places that currently call
osl_getThreadTextEncoding would have to be checked whether or not they have to
be switched to osl_getThreadTextEncoding').
Comment 15 hdu@apache.org 2009-04-22 15:11:21 UTC
Adjusting target to priority unless someone with that itch to scratch provides the patch. All the original 
developers of that area fled in disgust...
Comment 16 hdu@apache.org 2009-04-23 07:55:39 UTC
> OOo internally keeps file URLs in UTF-8 and, when interfacing with external entities, converts relevant 
data

This comment from #desc 15 is wishful thinking, as this would mean that osl_getThreadTextEncoding() 
based conversions are only in the code paths just above the system calls. The generous sprinkling of osl_getThreadTextEncoding() all over the code base base wide outside the sal-layers shows that this is 
obviously not so. Especially convertToFileUrl() (in 
cpputools/source/registercomponent/registercomponent.cxx) proves that the optimistic assumption in 
#desc 15 was  wrong.
Comment 17 Stephan Bergmann 2009-04-23 08:14:34 UTC
@hdu:  In
tags/DEV300_m46/cpputools/source/registercomponent/registercomponent.cxx it is
not convertToFileUrl that uses osl_getThreadTextEncoding, but rather functions
like parseOptions that take external data (main's char**argv) and convert it to
OOo's internal UTF-16.

Maybe it makes sense to shift this discussion to a mailing list, or even to
face-to-face.
Comment 18 paisand 2010-10-03 16:11:36 UTC
Ooo 3.2.1 on Windows XP XP3 es-AR and the problem is still there.
Will be this fixed some some day soon?
Comment 19 Marcus 2017-05-20 10:55:25 UTC
Reset assigne to the default "issues@openoffice.apache.org".