Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | WW6: Writer incorrectly displays cyrillic characters on import | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Writer | Reporter: | kpalagin <kpalagin> | ||||||
Component: | open-import | Assignee: | AOO issues mailing list <issues> | ||||||
Status: | CONFIRMED --- | QA Contact: | |||||||
Severity: | Trivial | ||||||||
Priority: | P3 | CC: | issues, rb.henschel | ||||||
Version: | 680m178 | ||||||||
Target Milestone: | --- | ||||||||
Hardware: | PC | ||||||||
OS: | Windows XP | ||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Issue Depends on: | 103475 | ||||||||
Issue Blocks: | |||||||||
Attachments: |
|
Description
kpalagin
2006-07-27 15:41:52 UTC
Created attachment 38077 [details]
testcase
Created attachment 38078 [details]
Illustration
MRU->HBRINKM: open the attached Word 95 document -> the cyrillic characters are not chorectly imported. Dear developers, any estimate when this issue will be resolved? Maybe in 2.0.5? WBR, KP. Dear developers, my users (40 people) are badly affected by this issue - douments, like the one attached, are produced by external application that is quite popular in Russia. Please see if this issue can be fixed in 2.1. Thanks a lot for your attention. WBR, K. Palagin. This affects me as well - as a member of the Coldwell Banker Residential Brokerage, our mandate requires careful, open, and honest communication with our respective clients - and MA law regquires it to be regardless of race, creed, color, or **national origin**. The inability to import mixed-mode documents like kpalagin mentions seriously impacts our ability to effectively communicate with our customers that speak natively using the Cyrillic alphabet - without using other, non-open-source, solutions. Thanks! Jim Probably the same problem as in issue 29006. The behavior is probably be triggered simply by mapping byte values > 126 to ANSI codepage instead the the codepage specified in WW6 file. The solution would be to check the encoding of WW6 file, and use the proper encoding function (which probably is already there) instead of using the ANSI one. The offending place is probably sw_w4wpar1.cxx - Read_ExtendCharSet(). As you can see, only few charsets are being converted. Should be simply extended to cover all charsets. Including the snippet: void SwW4WParser::Read_ExtendCharSet() // (XCS) { BYTE c; long nValue; if( W4WR_TXTERM == GetDecimal( nValue ) && !nError && GetHexByte( c ) && !nError ) { rtl_TextEncoding eCodeSet = RTL_TEXTENCODING_MS_1252; if( nValue == 850 && c == 0xef ) //! Sonderbehandlung fuer Haeckchen { // von WordPerfect nValue = 819; c = 180; // Macke W4W ?? } if ( !( nIniFlags & W4WFL_NO_WW_SPECCHAR ) && ( nDocType == 44 || nDocType == 49 ) //! WW2: Hier stimmen die && nValue == 9998 // Umlaute "A, "U, "s nicht && ( c == 0xc4 || c == 0xdc || c == 0xdf )) nValue = 819; // mache dann Umlaute aus Symbolen switch( nValue ) { case 9999: // Complete Mactintosh Char Set eCodeSet = RTL_TEXTENCODING_APPLE_ROMAN; #ifdef MAC if ( nDocType == 1 && rVersion == "0" ) // Dos-Ascii eCodeSet = RTL_TEXTENCODING_IBM_850; // Fehler im Dos-Filter // umpopeln #endif break; case 437: // Standard US PC code page eCodeSet = RTL_TEXTENCODING_IBM_437; break; case 850: // Standard international PC code page eCodeSet = RTL_TEXTENCODING_IBM_850; break; case 819: // ANSI code page eCodeSet = ( 39 == nDocType && rVersion.EqualsAscii( "0" )) // MS Works f. DOS ? RTL_TEXTENCODING_IBM_850 : RTL_TEXTENCODING_MS_1252; break; case 8591: // ISO 8859-1 eCodeSet = RTL_TEXTENCODING_ISO_8859_1; break; case 8592: // ISO 8859-2 eCodeSet = RTL_TEXTENCODING_ISO_8859_2; break; case 9998: // Windows Standard-Symbol-Charset { SvxFontItem aFont( FAMILY_DONTKNOW, String::CreateFromAscii( RTL_CONSTASCII_STRINGPARAM( "Symbol" )), aEmptyStr, PITCH_DONTKNOW, RTL_TEXTENCODING_SYMBOL ); Flush(); SetAttr( aFont ); // neuer Font FlushChar( c ); Flush(); // Sonderzeichen // Font wieder zurueck pCtrlStck->SetAttr( *pCurPaM->GetPoint(), RES_CHRATR_FONT ); bWasXCS = TRUE; return; } } *** Issue 96565 has been marked as a duplicate of this issue. *** This bug from 2006 year! Please, check it! Version 3.0.1 -- the bug still here. Why the import converter doesn't use the language text attribute to match the correct ANSI charset? Please have a look at issue 61927 and its attached patch, whether it would solve this issue too. Reset the assignee to the default "issues@openoffice.apache.org". |