Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Bug in Encoding of Hebrew letters in filename | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | porting | Reporter: | alan | ||||||||
Component: | code | Assignee: | tino.rachui | ||||||||
Status: | CLOSED FIXED | QA Contact: | issues@porting <issues> | ||||||||
Severity: | Trivial | ||||||||||
Priority: | P3 | CC: | asari, bjoern.zessack, issues, smokey.ardisson, xslf | ||||||||
Version: | OOo 1.1 RC5 | ||||||||||
Target Milestone: | OOo 2.0 | ||||||||||
Hardware: | Mac | ||||||||||
OS: | Mac OS X, all | ||||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||||
Developer Difficulty: | --- | ||||||||||
Attachments: |
|
Description
alan
2004-02-12 15:03:39 UTC
mh->ayaniger: is this a MacOSX problem only ? reassigned. See also discussion (Alan Yaniger) on openoffice.porting.dev from 02/17/2004 for additional information Attached is a file with patches for *.c and *.cxx files in sal/osl/unx. There is also a patch for sal/inc/osl/thread.h Created attachment 13553 [details]
Patches for sal/inc/osl/thread.h, sal/osl/unx/*.c, and sal/osl/unx/*.cxx
Interestingly I got a separate EMail today from Boris Reznik in Israel claiming that while my "Start OpenOffice.org" launcher did not support opening files with Hebrew letters in the name, the "CoooL" launcher did work. However Boris didn't state what version of OOo he was using - I have to suspect OOo 1.0.3GM. If OOo requires source changes to support Hebrew filenames, then I can't see how "CoooL" would be working. However it does seem this bug report is discussing SAVE rather than OPEN. Because of limited resources for OOo1.1.2 we decided to shift this task to OOo2.0. Please have a look at #i28928# Kind Regards, Tino Hi Tino, I looked at the issue 28928, and it wasn't obvious to me how it was relevant to this issue. Could you explain more fully? Thanks, Alan Hi Alan, well sal converts file names which it gets from the system to UTF8. Because no encoding is linked to such a system file name sal uses the current thread text encoding for the conversion. But some systems always use a specific encoding for file names (UTF8 for instance as in the current case), in this case the sal conversion fails as we saw. On the other hand we cannot patch osl_getThreadTextEncoding to always deliver the encoding used at the file system interface as this function is even used in cases which have nothing to do with the aforementioned issues and where we want indeed the current thread text encoding. That's why the proposal to introduce a pair of new functions to set the encoding which will be used for file name to file url conversion. In the concrete case this function would return UTF8 for instance. HTH, Tino Tino, you probably wanted to mention issue 28982 instead of 28928 :-) *** Issue 16281 has been marked as a duplicate of this issue. *** Well, if a Linux bug was marked as a dup of a Mac bug, then the PLATFORM and OS need to be changed to ALL *** Issue 29224 has been marked as a duplicate of this issue. *** I agree with Tino's point that changing how osl_getThreadTextEncoding() works will cause other things to break. The better solution (and one that I have been using in released versions NeoOffice/J) is to #define osl_getThreadTextEncoding() RTL_TEXTENCODING_UTF8 when MACOSX is defined in the following sal/osl/unx files: file.c module.c pipe.c process.c process_impl.cxx profile.c security.c tempfile.c uunxapi.cxx Hi *, a more concrete proposal: We would like to introduce two new functions in sal osl_setFileSystemEncoding osl_getFileSystemEncoding These functions deliver the encoding which should be used for encoding/decoding system paths to or from file urls. For platforms which are using a fixed encoding these function could well deliver the required encoding while on other systems the functions could just call osl_getThreadTextEncoding to get an encoding. In the desktop project there is some code which detects specific desktop environments like Gnome, etc. this would be a good place to set the to be used file system encoding if necessary. Hopefully I fix this bug before OOo 2.0 beta. I will propose this sal extension on openoffice.interface-discuss too. Hi Alan, I played a little bit with a Mac (though my Mac knowledge is very limited) in order to investigate the problem with regards to this bug. To me it seems that the problem has something to do with a "misconfigured" system. It would be nice if some Mac guru's could verify my findings and maybe suggest some fixes which might be more appropriate than the suggested fix to overwrite osl_getThreadTextEncoding in the osl file system interface. It is known that osl uses osl_getThreadTextEncoding in order to get an encoding used for converting system paths to file URLs and vice versa. osl_getThreadTextEncoding will be initialized by a function osl_getProcessLocale which calls a fuunction _imp_getProcessLocale (see osl/unx/nlsupport.c). This function basically looks like follows: void _imp_getProcessLocale(...) { /* set the locale defined by the env vars */ char* locale = setlocale( LC_CTYPE, "" ); /* fallback to the current locale */ if( NULL == locale ) locale = setlocale( LC_CTYPE, NULL ); /* return the LC_CTYPE locale */ *ppLocale = _parse_locale( locale ); } If the function fails to provide a valid locale the "C" locale will be used by sal/osl (see _parse_locale in the same file). It seems that under MacOS X the "C" locale is always active no matter which language is configured which would be a reasonable explanation for the problems on Hebrew systems. Does MacOS X have means to query the currently configured locale and wouldn't it be more useful to implement osl_getProcessLocale Mac specific? I'm happily willing to accept and integrate patches into sal. If there is no better patch than the currently suggested one we can also take this one. Kind Regards, Tino Because of limited resources deferred to OOo later. Meanwhile I've got a Mac of my own and can pick up the problem. As described already the problem is that a Mac specific way for detecting the system locale is necessary. The Mac has an own API for this. It is necessary that a '.UTF-8' will be appended to each returned locale e.g. 'en_US.UTF-8' because this part will be used to determine which encoding shall be used for encoding/decoding file names. *** Issue 46963 has been marked as a duplicate of this issue. *** Created attachment 27507 [details]
Patch just tested under Panther!
Created attachment 27508 [details]
Hack in file.cxx no longer needed with osxlocale patch
Platform -> 'Macintosh' OS -> 'Mac OS X' Fixed on cws macosx10 Verified with m112 / Mac OSX Tiger Patches for 27 June will only work with OOo 1.9_m series, not with SRX645. Appropriate patches for SRX645 are in macxjoin1153. *** Issue 50503 has been marked as a duplicate of this issue. *** thanks I can input Japanese with this patch: verified with: 1.9m119/kinput2.macim thanks I can input Japanese with this patch: verified with: 1.9m119/kinput2.macim. Compile fails for me, saying: ============= Building project udkapi ============= /sw/src/fink.build/openoffice.org-ja-1.9m121-50/udkapi/com/sun/star mkout -- version: 1.4 idlc @/tmp/mkDmCDfr Could not get Canonical Locale Identifier from AppleLanguages value! Bus error dmake: Error code 138, while making '../../../unxmacxp.pro/misc/urd_css.don' '---* tg_merge.mk *---' ERROR: Error 65280 occurred while making /sw/src/fink.build/openoffice.org-ja-1.9m121-50/udkapi/com/sun/star dmake: Error code 1, while making 'build_all' In my environment (Tiger), $ defaults read 'Apple Global Domain' AppleLanguages returns: The domain/default pair of (kCFPreferencesAnyApplication, AppleLanguages) does not exist Any helps? TRA: Verified on master -> ok. Closing issue. |