Issue 15672 - please add iscii-devanagari to sal/textenc to support Hindi
Summary: please add iscii-devanagari to sal/textenc to support Hindi
Status: CLOSED FIXED
Alias: None
Product: porting
Classification: Code
Component: code (show other issues)
Version: OOo 1.1 Beta2
Hardware: All All
: P3 Trivial (vote)
Target Milestone: OOo 1.1 RC
Assignee: khendricks
QA Contact: issues@porting
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-16 12:46 UTC by khendricks
Modified: 2003-06-24 19:52 UTC (History)
2 users (show)

See Also:
Issue Type: TASK
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
conversions routines from devanagari to unicode (3.19 KB, text/plain)
2003-06-16 12:47 UTC, khendricks
no flags Details
conversion routine from unicode to Devanagari iscii (3.52 KB, text/plain)
2003-06-16 12:48 UTC, khendricks
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description khendricks 2003-06-16 12:46:55 UTC
Hi,  
  
The Hindi developers are well under way  to creating a Hindi dictionary for OOo.  But MySpell has 
a constraint that all dictionaries and affix files be encoded in a 1 byte = 1 char style encoding and 
OOo conversion routines are used to convert from unicode to this encoding and back. 
 
Since there is no  1 byte-1char encoding for Hindi, the uathor of the hindi dictionary would 
very much like to add iscii for Devanagari to sal/tentenc. 
 
They have written the unicode to iscii conversion routines and back which I will 
attach to this issue in case it helps. 
 
Thanks, 
 
Kevin
Comment 1 khendricks 2003-06-16 12:47:50 UTC
Created attachment 6919 [details]
conversions routines from devanagari to unicode
Comment 2 khendricks 2003-06-16 12:48:31 UTC
Created attachment 6920 [details]
conversion routine from unicode to Devanagari iscii
Comment 3 khendricks 2003-06-16 12:50:54 UTC
Hi, 
 
Hopefully re-assigning to Stepahn (hope he is sb@openoffice.org) at his request. 
 
Kevin 
 
Comment 4 Stephan Bergmann 2003-06-16 15:13:37 UTC
Stephan->Kevin:  Setting this to "OOo 2.0."  Let me know if this
causes any problems.
Comment 5 khendricks 2003-06-16 15:17:47 UTC
Hi, 
 
Any chance we could give this a target of 1.1.1 so that we can roll out full hindi 
support in some post 1.1 followup build (all of the other pieces are already in place to 
support hindi). 
 
I would be happy to backport any changes to 1.1 tree if need be. 
 
Thanks, 
 
Kevin 
 
Comment 6 Stephan Bergmann 2003-06-16 17:27:25 UTC
As discussed internally in Hamburg, set to "OOo 1.1 RC."
Comment 7 Stephan Bergmann 2003-06-17 12:28:24 UTC
Added RTL_TEXTENCODING_ISCII_DEVANAGARI to rtl/textenc.h, and added
corresponding mapping tables to convert to/from Unicode.

These tables do not follow the code in the two attachments
(iscii-to-unicode.c and unicode-to-iscii.c) precisely, however.  The
attachments support conversion between the single Unicode characters
U+0958--095E and sequences of two ISCII characters, of which the
second is the combining nukta (ISCII 0xE9).  I did not add this, as it
goes against the general rtl/textenc design for single byte character
encodings.  Such (multi-character) mappings should be done when the
(not yet implemented, for any RTL_TEXTENCODING)
RTL_UNICODETOTEXT_FLAGS_UNDEFINED_REPLACESTR flag is set.  Hopefully,
the Unicode characters U+0958--095E are rare enough that we can handle
this in a follow-up bug, if need be.
Comment 8 Stephan Bergmann 2003-06-18 15:48:51 UTC
.
Comment 9 Stephan Bergmann 2003-06-18 15:51:11 UTC
This fix is not really QA-testable.  (Only available tests are unit
tests in sal/test/testtextenc.cxx.)
Comment 10 Stephan Bergmann 2003-06-18 15:52:42 UTC
.
Comment 11 stefan.baltzer 2003-06-20 10:22:46 UTC
SBA->Kevin: Nothing to test for me.
Please take this one and close it if all works smoothly. Thanks!
Comment 12 stefan.baltzer 2003-06-20 10:23:49 UTC
Reassigned to Kevin.
Comment 13 stefan.baltzer 2003-06-20 10:24:12 UTC
Set to fixed again.
Comment 14 stefan.baltzer 2003-06-20 10:25:43 UTC
Marked verified. This tool is a pain :-(
-> Kevin: Now it's yours with the appropriate flags. ;-)
Comment 15 khendricks 2003-06-23 17:07:00 UTC
Hi, 
 
Reopening this issue. 
 
Looking in sal/inc/rtl/textenc.h in cws_srx645_ooo11rc 
and doing  
 
 
[kbhend@base1 rtl]$ grep -i iscii * 
[kbhend@base1 rtl]$ 
 
So I do not think this ever made it into the OOo 1.1 rc tree at all. 
 
Kevin 
 
 
Comment 16 khendricks 2003-06-23 17:07:54 UTC
Hi Stephan, 
 
iscii support does not seem to have made it into OOo 1.1 RC 
 
Kevin 
 
Comment 17 khendricks 2003-06-23 17:08:46 UTC
Hi Stephan, 
 
If you can tell me the sal files that are changed and the required versions I will 
commit them to cws_srx645_ooo11rc. 
 
Thanks, 
 
Kevin 
 
Comment 18 khendricks 2003-06-23 19:45:38 UTC
Hi Stephan, 
 
I found your related changes in HEAD and generated the following 
diff.  If you approve it I can commit it to cws_srx645_ooo11rc if you 
want me to. 
 
Thanks, 
 
Kevin 
 
 
--- sal.keep/inc/rtl/textenc.h  2003-03-26 11:45:50.000000000 -0500 
+++ sal/inc/rtl/textenc.h       2003-06-20 06:11:16.000000000 -0400 
@@ -2,9 +2,9 @@ 
  * 
  *  $RCSfile: textenc.h,v $ 
  * 
- *  $Revision: 1.10 $ 
+ *  $Revision: 1.11 $ 
  * 
- *  last change: $Author: hr $ $Date: 2003/03/26 16:45:50 $ 
+ *  last change: $Author: vg $ $Date: 2003/06/20 10:11:16 $ 
  * 
  *  The Contents of this file are made available subject to the terms of 
  *  either of the following licenses 
@@ -175,6 +175,7 @@ 
 #define RTL_TEXTENCODING_BIG5_HKSCS             (RTL_TEXTENC_CAST( 86 
)) 
 #define RTL_TEXTENCODING_TIS_620                (RTL_TEXTENC_CAST( 87 )) 
 #define RTL_TEXTENCODING_KOI8_U                 (RTL_TEXTENC_CAST( 88 )) 
+#define RTL_TEXTENCODING_ISCII_DEVANAGARI       (RTL_TEXTENC_CAST( 
89 )) 
 /* ATTENTION!  Whenever some encoding is added here, make sure to update 
  * rtl_isOctetEncoding in tencinfo.c. 
  */ 
@@ -248,6 +249,8 @@ 
 
 Latin 3 (ISO-8859-3)                            RTL_TEXTENCODING_ISO_8859_3 
 
+Indian (ISCII Devanagari)                      
RTL_TEXTENCODING_ISCII_DEVANAGARI 
+ 
 Japanese (Apple Macintosh)                      
RTL_TEXTENCODING_APPLE_JAPANESE 
 Japanese (EUC-JP)                               RTL_TEXTENCODING_EUC_JP 
 # Japanese (ISO-2022-JP)                          RTL_TEXTENCODING_ISO_2022_JP 
diff -urN sal.keep/textenc/convertiscii.tab sal/textenc/convertiscii.tab 
--- sal.keep/textenc/convertiscii.tab   1969-12-31 19:00:00.000000000 -0500 
+++ sal/textenc/convertiscii.tab        2003-06-20 06:11:41.000000000 -0400 
@@ -0,0 +1,146 @@ 
+/************************************************************************* 
+ * 
+ *  $RCSfile: convertiscii.tab,v $ 
+ * 
+ *  $Revision: 1.2 $ 
+ * 
+ *  last change: $Author: vg $ $Date: 2003/06/20 10:11:41 $ 
+ * 
+ *  The Contents of this file are made available subject to the terms of 
+ *  either of the following licenses 
+ * 
+ *         - GNU Lesser General Public License Version 2.1 
+ *         - Sun Industry Standards Source License Version 1.1 
+ * 
+ *  Sun Microsystems Inc., October, 2000 
+ * 
+ *  GNU Lesser General Public License Version 2.1 
+ *  ============================================= 
+ *  Copyright 2000 by Sun Microsystems, Inc. 
+ *  901 San Antonio Road, Palo Alto, CA 94303, USA 
+ * 
+ *  This library is free software; you can redistribute it and/or 
+ *  modify it under the terms of the GNU Lesser General Public 
+ *  License version 2.1, as published by the Free Software Foundation. 
+ * 
+ *  This library is distributed in the hope that it will be useful, 
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of 
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
GNU 
+ *  Lesser General Public License for more details. 
+ * 
+ *  You should have received a copy of the GNU Lesser General Public 
+ *  License along with this library; if not, write to the Free Software 
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, 
+ *  MA  02111-1307  USA 
+ * 
+ * 
+ *  Sun Industry Standards Source License Version 1.1 
+ *  ================================================= 
+ *  The contents of this file are subject to the Sun Industry Standards 
+ *  Source License Version 1.1 (the "License"); You may not use this file 
+ *  except in compliance with the License. You may obtain a copy of the 
+ *  License at http://www.openoffice.org/license.html. 
+ * 
+ *  Software provided under this License is provided on an "AS IS" basis, 
+ *  WITHOUT WARRUNTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, 
INCLUDING, 
+ *  WITHOUT LIMITATION, WARRUNTIES THAT THE SOFTWARE IS FREE OF 
DEFECTS, 
+ *  MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, OR 
NON-INFRINGING. 
+ *  See the License for the specific provisions governing your rights and 
+ *  obligations concerning the Software. 
+ * 
+ *  The Initial Developer of the Original Code is: Sun Microsystems, Inc.. 
+ * 
+ *  Copyright: 2000 by Sun Microsystems, Inc. 
+ * 
+ *  All Rights Reserved. 
+ * 
+ *  Contributor(s): _______________________________________ 
+ * 
+ * 
+ ************************************************************************/ 
+ 
+#include "tenchelp.h" 
+ 
+#include "rtl/tencinfo.h" 
+#include "sal/types.h" 
+ 
+#include <stddef.h> 
+ 
+/* Conversion tables for the Devanagari version of ISCII (IS 13194:1991). 
+ * 
+ * They do not map the ISCII characters INV (0xD9), ATR (0xEF), and EXT (0xF0). 
+ * They do not map U+0958--095E to sequences of two ISCII characters, of which 
+ * the second would be the combining nukta (0xE9). 
+ */ 
+ 
+/* The following table is based on LGPL code by Sandeep Patnaik 
+ * (patnaik@students.iiit.net) and Sunil Mohan Adapa 
+ * (sunilmohanadapa@postmark.net). 
+ */ 
+#define RTL_TEXTENC_ISCII_DEVANAGARI_START 0xA1 
+#define RTL_TEXTENC_ISCII_DEVANAGARI_END 0xFA 
+static sal_uInt16 const 
+aImplIsciiDevanagariToUniTab[RTL_TEXTENC_ISCII_DEVANAGARI_END 
+                             - RTL_TEXTENC_ISCII_DEVANAGARI_START + 1] 
+= {         0x0901, 0x0902, 0x0903, 0x0905, 0x0906, 0x0907, 0x0908, /* A0 */ 
+    0x0909, 0x090A, 0x090B, 0x090E, 0x090F, 0x0910, 0x090D, 0x0912, 
+    0x0913, 0x0914, 0x0911, 0x0915, 0x0916, 0x0917, 0x0918, 0x0919, /* B0 */ 
+    0x091A, 0x091B, 0x091C, 0x091D, 0x091E, 0x091F, 0x0920, 0x0921, 
+    0x0922, 0x0923, 0x0924, 0x0925, 0x0926, 0x0927, 0x0928, 0x0929, /* C0 */ 
+    0x092A, 0x092B, 0x092C, 0x092D, 0x092E, 0x092F, 0x095F, 0x0930, 
+    0x0931, 0x0932, 0x0933, 0x0934, 0x0935, 0x0936, 0x0937, 0x0938, /* D0 */ 
+    0x0939,      0, 0x093E, 0x093F, 0x0940, 0x0941, 0x0942, 0x0943, 
+    0x0946, 0x0947, 0x0948, 0x0945, 0x094A, 0x094B, 0x094C, 0x0949, /* E0 */ 
+    0x094D, 0x093C, 0x0964,      0,      0,      0,      0,      0, 
+         0, 0x0966, 0x0967, 0x0968, 0x0969, 0x096A, 0x096B, 0x096C, /* F0 */ 
+    0x096D, 0x096E, 0x096F }; 
+ 
+#define RTL_TEXTENC_UNICODE_DEVANAGARI_START 0x0901 
+#define RTL_TEXTENC_UNICODE_DEVANAGARI_END 0x096F 
+static sal_uChar const 
+aImplUniToIsciiDevanagariTab[RTL_TEXTENC_UNICODE_DEVANAGARI_END 
+                             - RTL_TEXTENC_UNICODE_DEVANAGARI_START + 1] 
+= {       0xA1, 0xA2, 0xA3,    0, 0xA4, 0xA5, 0xA6, /* U+0900 */ 
+    0xA7, 0xA8, 0xA9, 0xAA,    0, 0xAE, 0xAB, 0xAC, 
+    0xAD, 0xB2, 0xAF, 0xB0, 0xB1, 0xB3, 0xB4, 0xB5, /* U+0910 */ 
+    0xB6, 0xB7, 0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 
+    0xBE, 0xBF, 0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, /* U+0920 */ 
+    0xC6, 0xC7, 0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 
+    0xCF, 0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, /* U+0930 */ 
+    0xD7, 0xD8,    0,    0, 0xE9,    0, 0xDA, 0xDB, 
+    0xDC, 0xDD, 0xDE, 0xDF,    0, 0xE3, 0xE0, 0xE1, /* U+0940 */ 
+    0xE2, 0xE7, 0xE4, 0xE5, 0xE6, 0xE8,    0,    0, 
+       0,    0,    0,    0,    0,    0,    0,    0, /* U+0950 */ 
+       0,    0,    0,    0,    0,    0,    0, 0xCE, 
+       0,    0,    0,    0, 0xEA,    0, 0xF1, 0xF2, /* U+0960 */ 
+    0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA }; 
+ 
+static ImplByteConvertData const aImplIsciiDevanagariConvertData 
+= { aImplIsciiDevanagariToUniTab, 
+    NULL, 
+    RTL_TEXTENC_ISCII_DEVANAGARI_START, 
RTL_TEXTENC_ISCII_DEVANAGARI_END, 
+    NOTABUNI_START, NOTABUNI_END, 
+    aImplUniToIsciiDevanagariTab, 
+    NULL, 
+    NULL, 
+    RTL_TEXTENC_UNICODE_DEVANAGARI_START, 
RTL_TEXTENC_UNICODE_DEVANAGARI_END, 
+    NOTABCHAR_START, NOTABCHAR_END, 
+    0 }; 
+ 
+static ImplTextEncodingData const aImplIsciiDevanagariTextEncodingData 
+    = { { &aImplIsciiDevanagariConvertData, 
+          &ImplCharToUnicode, 
+          &ImplUnicodeToChar, 
+          NULL, 
+          NULL, 
+          NULL, 
+          NULL, 
+          NULL, 
+          NULL }, 
+        1, 
+        1, 
+        1, 
+        1, 
+        NULL, 
+        NULL, 
+        RTL_TEXTENCODING_INFO_ASCII }; 
diff -urN sal.keep/textenc/tencinfo.c sal/textenc/tencinfo.c 
--- sal.keep/textenc/tencinfo.c 2003-04-11 10:25:01.000000000 -0400 
+++ sal/textenc/tencinfo.c      2003-06-20 06:11:52.000000000 -0400 
@@ -2,9 +2,9 @@ 
  * 
  *  $RCSfile: tencinfo.c,v $ 
  * 
- *  $Revision: 1.21 $ 
+ *  $Revision: 1.22 $ 
  * 
- *  last change: $Author: vg $ $Date: 2003/04/11 14:25:01 $ 
+ *  last change: $Author: vg $ $Date: 2003/06/20 10:11:52 $ 
  * 
  *  The Contents of this file are made available subject to the terms of 
  *  either of the following licenses 
@@ -86,7 +86,8 @@ 
 sal_Bool SAL_CALL rtl_isOctetTextEncoding(rtl_TextEncoding nEncoding) 
 { 
     return nEncoding > RTL_TEXTENCODING_DONTKNOW 
-           && nEncoding <= RTL_TEXTENCODING_KOI8_U /* always update this! */ 
+           && nEncoding <= RTL_TEXTENCODING_ISCII_DEVANAGARI 
+                              /* always update this! */ 
            && nEncoding != 9; /* RTL_TEXTENCODING_SYSTEM */ 
 } 
 
diff -urN sal.keep/textenc/textenc.c sal/textenc/textenc.c 
--- sal.keep/textenc/textenc.c  2003-03-26 11:47:17.000000000 -0500 
+++ sal/textenc/textenc.c       2003-06-20 06:12:03.000000000 -0400 
@@ -2,9 +2,9 @@ 
  * 
  *  $RCSfile: textenc.c,v $ 
  * 
- *  $Revision: 1.9 $ 
+ *  $Revision: 1.10 $ 
  * 
- *  last change: $Author: hr $ $Date: 2003/03/26 16:47:17 $ 
+ *  last change: $Author: vg $ $Date: 2003/06/20 10:12:03 $ 
  * 
  *  The Contents of this file are made available subject to the terms of 
  *  either of the following licenses 
@@ -155,6 +155,7 @@ 
 #include "tcvttcn2.tab" 
 #include "tcvttcn6.tab" 
 #include "tcvtuni1.tab" 
+#include "convertiscii.tab" 
 
 #include "convertbig5hkscs.tab" 
 #include "converteuctw.tab" 
@@ -255,7 +256,8 @@ 
             &aImplGb18030TextEncodingData, /* GB_18030 */ 
             &aImplBig5HkscsTextEncodingData, /* BIG5_HKSCS */ 
             &aImplTis620TextEncodingData, /* TIS_620 */ 
-            &aImplKoi8UTextEncodingData }; /* KOI8_U */ 
+            &aImplKoi8UTextEncodingData, /* KOI8_U */ 
+            &aImplIsciiDevanagariTextEncodingData }; /* ISCII_DEVANAGARI */ 
     OSL_ENSURE(nEncoding >= RTL_TEXTENCODING_DONTKNOW 
                && nEncoding <= RTL_TEXTENCODING_UNICODE, 
                "specification violation"); 
 
Comment 19 Stephan Bergmann 2003-06-24 08:07:53 UTC
SB->Kevin:  I implemented this on cws "welsh," targeted for OOo 1.1
RC, and now integrated into SRX645m8.  (Not everything that goes into
OOo 1.1 RC is implemented on cws "ooo11rc," at least that is my
understanding.)  The best solution now, I think, is to ask Martin to
resync cws "ooo11rc" against SRX645m8.

SB->MH:  Can you do that?
Comment 20 pavel 2003-06-24 08:54:40 UTC
oo11rc is not buildable now, because there is only part of this patch.
Comment 21 Stephan Bergmann 2003-06-24 09:11:36 UTC
SB->Kevin:  Just discussed my above comments with MH.  There seems to
be some confusion as to which cwss are appropriate for which
targets...  We came to the conclusion that it would be best if you
apply the listed patches you generated from head to cws ooo11rc; I
approve them.  Sorry for the inconveniences.
Comment 22 khendricks 2003-06-24 19:48:35 UTC
Hi Stepahn,

Thanks, I just committed it.

Kevin
Comment 23 khendricks 2003-06-24 19:52:34 UTC
Closing