Issue 67838

Summary: WW6: Writer incorrectly displays cyrillic characters on import
Product: Writer Reporter: kpalagin <kpalagin>
Component: open-importAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues, rb.henschel
Version: 680m178   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on: 103475    
Issue Blocks:    
Attachments:
Description Flags
testcase
none
Illustration none

Description kpalagin 2006-07-27 15:41:52 UTC
(This is possibly related to 
http://qa.openoffice.org/issues/show_bug.cgi?id=63105
and
http://qa.openoffice.org/issues/show_bug.cgi?id=67768)

Writer is displaying garbled symbols instead of valid when attached "garbled-
cyrillic.doc" is opened (see circled in red on attached "garbled-cyrillic-
screenshot"). Word displays the file just fine (see circled green on attached 
screenshot).
Comment 1 kpalagin 2006-07-27 15:42:37 UTC
Created attachment 38077 [details]
testcase
Comment 2 kpalagin 2006-07-27 15:43:54 UTC
Created attachment 38078 [details]
Illustration
Comment 3 michael.ruess 2006-07-27 16:29:54 UTC
MRU->HBRINKM: open the attached Word 95 document -> the cyrillic characters are
not chorectly imported.
Comment 4 kpalagin 2006-08-14 12:35:33 UTC
Dear developers,
any estimate when this issue will be resolved? Maybe in 2.0.5?
WBR,
KP.
Comment 5 kpalagin 2006-09-26 07:21:33 UTC
Dear developers,
my users (40 people) are badly affected by this issue - douments, like the one 
attached, are produced by external application that is quite popular in 
Russia. 
Please see if this issue can be fixed in 2.1.

Thanks a lot for your attention.
WBR,
K. Palagin.
Comment 6 jharris1993 2006-09-28 20:51:47 UTC
This affects me as well - as a member of the Coldwell Banker Residential 
Brokerage, our mandate requires careful, open, and honest communication with 
our respective clients - and MA law regquires it to be regardless of race, 
creed, color, or **national origin**.

The inability to import mixed-mode documents like kpalagin mentions seriously 
impacts our ability to effectively communicate with our customers that speak 
natively using the Cyrillic alphabet - without using other, non-open-source, 
solutions.

Thanks!

Jim
Comment 7 milek_pl 2006-12-17 13:25:06 UTC
Probably the same problem as in issue 29006. The behavior is probably be
triggered simply by mapping byte values > 126 to ANSI codepage instead the the
codepage specified in WW6 file. The solution would be to check the encoding of
WW6 file, and use the proper encoding function (which probably is already there)
instead of using the ANSI one.
Comment 8 milek_pl 2006-12-17 13:36:35 UTC
The offending place is probably sw_w4wpar1.cxx -  Read_ExtendCharSet(). As you
can see, only few charsets are being converted. Should be simply extended to
cover all charsets.

Including the snippet:

void SwW4WParser::Read_ExtendCharSet()   // (XCS)
{
 BYTE c;
 long nValue;
 if( W4WR_TXTERM == GetDecimal( nValue ) && !nError &&
  GetHexByte( c ) && !nError )
 {
  rtl_TextEncoding eCodeSet = RTL_TEXTENCODING_MS_1252;

  if( nValue == 850 && c == 0xef ) //! Sonderbehandlung fuer Haeckchen
  {                                 // von WordPerfect
   nValue = 819;
   c = 180;      // Macke W4W ??
  }
  if ( !( nIniFlags & W4WFL_NO_WW_SPECCHAR )
    && ( nDocType == 44 || nDocType == 49 ) //! WW2: Hier stimmen die
    && nValue == 9998      // Umlaute "A, "U, "s nicht
    && ( c == 0xc4 || c == 0xdc || c == 0xdf ))
   nValue = 819;     // mache dann Umlaute aus Symbolen

  switch( nValue )
  {
  case 9999: // Complete Mactintosh Char Set
   eCodeSet = RTL_TEXTENCODING_APPLE_ROMAN;
#ifdef MAC
   if ( nDocType == 1 && rVersion == "0" )  // Dos-Ascii
    eCodeSet = RTL_TEXTENCODING_IBM_850; // Fehler im Dos-Filter
              // umpopeln
#endif
   break;
  case 437: // Standard US PC code page
   eCodeSet = RTL_TEXTENCODING_IBM_437;
   break;
  case 850: // Standard international PC code page
   eCodeSet = RTL_TEXTENCODING_IBM_850;
   break;
  case 819: // ANSI code page
   eCodeSet = ( 39 == nDocType &&
       rVersion.EqualsAscii( "0" ))  // MS Works f. DOS
      ? RTL_TEXTENCODING_IBM_850
      : RTL_TEXTENCODING_MS_1252;
   break;
  case 8591: // ISO 8859-1
   eCodeSet = RTL_TEXTENCODING_ISO_8859_1;
   break;
  case 8592: // ISO 8859-2
   eCodeSet = RTL_TEXTENCODING_ISO_8859_2;
   break;
  case 9998: // Windows Standard-Symbol-Charset
   {
    SvxFontItem aFont( FAMILY_DONTKNOW,  String::CreateFromAscii(
         RTL_CONSTASCII_STRINGPARAM( "Symbol" )),
         aEmptyStr, PITCH_DONTKNOW,
         RTL_TEXTENCODING_SYMBOL );
    Flush();
    SetAttr( aFont );    // neuer Font
    FlushChar( c );
    Flush();        // Sonderzeichen
    // Font wieder zurueck
    pCtrlStck->SetAttr( *pCurPaM->GetPoint(), RES_CHRATR_FONT );
    bWasXCS = TRUE;
    return;
   }
  }
Comment 9 michael.ruess 2008-11-25 13:32:21 UTC
*** Issue 96565 has been marked as a duplicate of this issue. ***
Comment 10 nassaja 2008-11-25 21:25:13 UTC
This bug from 2006 year! Please, check it!
Comment 11 urmasd 2009-04-08 03:16:17 UTC
Version 3.0.1 -- the bug still here. Why the import converter doesn't use the
language text attribute to match the correct ANSI charset?
Comment 12 Regina Henschel 2014-12-25 13:41:53 UTC
Please have a look at issue 61927 and its attached patch, whether it would solve this issue too.
Comment 13 Marcus 2017-05-20 10:44:30 UTC
Reset the assignee to the default "issues@openoffice.apache.org".