Bug 10976

Summary: [Patch] Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class
Product: POI Reporter: SioLam Patrick Lee <patrickl>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED INVALID    
Severity: normal    
Priority: P3    
Version: 2.0-dev   
Target Milestone: ---   
Hardware: Other   
OS: other   
Attachments: Attachment of Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class

Description SioLam Patrick Lee 2002-07-19 07:11:28 UTC
Hi all

I have refined the Unicode support for sheetname patch.  Including in this patch
are refactoring of  SSTDeserializer & UnicodeString class to bring more code
relating to BIFF8 format from the former class to the latter class where it
should belong to. Please review it and consider for inclusion into the project

Thanks
Patrick Lee




Index: jakarta-poi/src/java/org/apache/poi/hssf/record/BoundSheetRecord.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hssf/record/BoundSheetRecord.java,v
retrieving revision 1.4
diff -u -r1.4 BoundSheetRecord.java
--- jakarta-poi/src/java/org/apache/poi/hssf/record/BoundSheetRecord.java	1 Mar 2002 13:27:10 -0000	1.4
+++ jakarta-poi/src/java/org/apache/poi/hssf/record/BoundSheetRecord.java	19 Jul 2002 07:00:22 -0000
@@ -54,7 +54,7 @@
  */
 
 package org.apache.poi.hssf.record;
-
+import org.apache.poi.util.BinaryTree;
 import org.apache.poi.util.LittleEndian;
 import org.apache.poi.util.StringUtil;
 
@@ -117,6 +117,16 @@
         }
     }
 
+    /**
+     *  lifted from SSTDeserializer
+     */
+
+    private void arraycopy( byte[] src, int src_position,
+                            byte[] dst, int dst_position,
+                            int length )
+    {
+        System.arraycopy( src, src_position, dst, dst_position, length );
+    }
     protected void fillFields(byte [] data, short size, int offset)
     {
         field_1_position_of_BOF         = LittleEndian.getInt(data,
@@ -125,8 +135,24 @@
                 4 + offset);
         field_3_sheetname_length        = data[ 6 + offset ];
         field_4_compressed_unicode_flag = data[ 7 + offset ];
-        field_5_sheetname               = new String(data, 8 + offset,
-                LittleEndian.ubyteToInt( field_3_sheetname_length));
+
+        int length = LittleEndian.ubyteToInt( field_3_sheetname_length);
+        if ((field_4_compressed_unicode_flag & 0x01)==1) {
+            UnicodeString ucs =new
UnicodeString(data,(short)length,6+offset,(short)1);
+            field_5_sheetname = ucs.getString();
+        }
+        else {
+          try {
+            // should get the codepage from the excel workbook record
+            // instead of hard coded "ISO-8859-1" here.
+            field_5_sheetname =   new String(data, 8 + offset,
+            LittleEndian.ubyteToInt( field_3_sheetname_length),"ISO-8859-1");
+          }
+          catch (Exception e){
+            e.printStackTrace();
+          }
+        }
+        System.out.println("f_5_sn is "+field_5_sheetname);
     }
 
     /**
@@ -175,13 +201,63 @@
     }
 
     /**
+     * Check if String use 16-bit encoding character
+     * Lifted from SSTRecord.addString
+     */
+    public boolean is16bitString(String string)
+    {
+            // scan for characters greater than 255 ... if any are
+            // present, we have to use 16-bit encoding. Otherwise, we
+            // can use 8-bit encoding
+            boolean useUTF16 = false;
+            int strlen = string.length();
+
+            for ( int j = 0; j < strlen; j++ )
+            {
+                if ( string.charAt( j ) > 255 )
+                {
+                    useUTF16 = true;
+                    break;
+                }
+            }
+            return useUTF16 ;
+   }
+    /**
+     * Check if String use all 8-bit (outside ASCII 7 bit range) to
+     * encoding character
+     * Lifted from SSTRecord.addString
+     */
+    public boolean is8bitString(String string)
+    {
+            // scan for characters greater than 255
+            //  ... if any are present then return false
+            boolean use8bit = true;
+            int strlen = string.length();
+
+            for ( int j = 0; j < strlen; j++ )
+            {
+                if ( string.charAt( j ) > 255 )
+                {
+                    use8bit = false;
+                    break;
+                }
+            }
+            return use8bit ;
+   }
+
+    /**
      * Set the sheetname for this sheet.  (this appears in the tabs at the bottom)
      * @param sheetname the name of the sheet
      */
 
     public void setSheetname(String sheetname)
     {
+        boolean is16bit = is16bitString(sheetname);
+        setSheetnameLength((byte) sheetname.length() );
+        setCompressedUnicodeFlag((byte ) (is16bit?1:0));
         field_5_sheetname = sheetname;
+
+
     }
 
     /**
@@ -263,20 +339,34 @@
     {
         LittleEndian.putShort(data, 0 + offset, sid);
         LittleEndian.putShort(data, 2 + offset,
-                              ( short ) (0x08 + getSheetnameLength()));
+                              ( short ) (0x08 + getSheetnameLength()*
(getCompressedUnicodeFlag()==0?1:2)));
         LittleEndian.putInt(data, 4 + offset, getPositionOfBof());
         LittleEndian.putShort(data, 8 + offset, getOptionFlags());
         data[ 10 + offset ] = getSheetnameLength();
         data[ 11 + offset ] = getCompressedUnicodeFlag();
 
-        // we assume compressed unicode (bein the dern americans we are ;-p)
-        StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset);
+        if (getCompressedUnicodeFlag()==0){
+          // we assume compressed unicode (bein the dern americans we are ;-p)
+          StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset);
+        }
+        else {
+          try {
+            StringUtil.putUncompressedUnicode(getSheetname(), data, 12 + offset);
+  //          String unicodeString = new
String(getSheetname().getBytes("Unicode"),"Unicode");
+  //          StringUtil.putUncompressedUnicode(unicodeString, data, 12 + offset);
+          }
+          catch (Exception e){
+            System.out.println("encoding exception in
BoundSheetRecord.serialize!");
+          }
+
+
+        }
         return getRecordSize();
     }
 
     public int getRecordSize()
     {
-        return 12 + getSheetnameLength();
+        return 12 + getSheetnameLength()* (getCompressedUnicodeFlag()==0?1:2);
     }
 
     public short getSid()
Index: jakarta-poi/src/java/org/apache/poi/hssf/record/SSTDeserializer.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hssf/record/SSTDeserializer.java,v
retrieving revision 1.4
diff -u -r1.4 SSTDeserializer.java
--- jakarta-poi/src/java/org/apache/poi/hssf/record/SSTDeserializer.java	12 Jun 2002 09:10:04 -0000	1.4
+++ jakarta-poi/src/java/org/apache/poi/hssf/record/SSTDeserializer.java	19 Jul 2002 07:00:25 -0000
@@ -204,24 +204,13 @@
     private int processString( final byte[] data, final int dataIndex, final
int characters )
     {
 
-        // length is the length we store it as.  not the length that is read.
-        int length = SSTRecord.STRING_MINIMAL_OVERHEAD + calculateByteCount(
characters );
-        byte[] unicodeStringBuffer = new byte[length];
-
-        int offset = 0;
-
-        // Set the length in characters
-        LittleEndian.putUShort( unicodeStringBuffer, offset, characters );
-        offset += LittleEndianConsts.SHORT_SIZE;
-        // Set the option flags
-        unicodeStringBuffer[offset] = data[dataIndex + offset];
-        // Copy in the string data
-        int bytesRead = unicodeStringBuffer.length -
SSTRecord.STRING_MINIMAL_OVERHEAD;
-        arraycopy( data, dataIndex + stringHeaderOverhead(),
unicodeStringBuffer, SSTRecord.STRING_MINIMAL_OVERHEAD, bytesRead );
-        // Create the unicode string
-        UnicodeString string = new UnicodeString( UnicodeString.sid,
-                (short) unicodeStringBuffer.length,
-                unicodeStringBuffer );
+        // use the new UnicodeString constructor
+
+        int bytesRead =calculateByteCount( characters );
+        UnicodeString string = new UnicodeString( data,
+          (short)(bytesRead +stringHeaderOverhead()),
+           dataIndex, stringHeaderOverhead()-SSTRecord.STRING_MINIMAL_OVERHEAD
+           ,(short)UnicodeString.CHARCOUNTSIZE_DOUBLEBYTE );
 
         if ( isStringFinished() )
         {
@@ -339,7 +328,9 @@
         LittleEndian.putShort( unicodeStringData, 0, (short)
getContinuationExpectedChars() );
 
         // write the options flag
-        unicodeStringData[LittleEndianConsts.SHORT_SIZE] = createOptionByte(
wideChar, richText, extendedText );
+//        Can't deal with richtext or Far-East Info.
+//        unicodeStringData[LittleEndianConsts.SHORT_SIZE] = createOptionByte(
wideChar, richText, extendedText );
+        unicodeStringData[LittleEndianConsts.SHORT_SIZE] = createOptionByte(
wideChar, false,false );
 
         // copy the bytes/words making up the string; skipping
         // past all the overhead of the str_data array
@@ -348,9 +339,14 @@
                 unicodeStringData.length - SSTRecord.STRING_MINIMAL_OVERHEAD );
 
         // use special constructor to create the final string
-        UnicodeString string = new UnicodeString( UnicodeString.sid,
-                (short) unicodeStringData.length, unicodeStringData,
-                unfinishedString );
+//        UnicodeString string = new UnicodeString( UnicodeString.sid,
+//                (short) unicodeStringData.length, unicodeStringData,
+//                unfinishedString );
+        UnicodeString string = new UnicodeString( unicodeStringData,
+                (short) unicodeStringData.length,
+                unfinishedString , UnicodeString.CHARCOUNTSIZE_DOUBLEBYTE );
+
+
         Integer integer = new Integer( strings.size() );
 
         addToStringTable( strings, integer, string );
@@ -411,8 +407,9 @@
 
         LittleEndian.putShort( unicodeStringData, (byte) 0, (short)
calculateCharCount( dataLengthInBytes ) );
         arraycopy( record, 0, unicodeStringData, LittleEndianConsts.SHORT_SIZE,
record.length );
-        UnicodeString ucs = new UnicodeString( UnicodeString.sid, (short)
unicodeStringData.length, unicodeStringData );
-
+        //UnicodeString ucs = new UnicodeString( UnicodeString.sid, (short)
unicodeStringData.length, unicodeStringData );
+        UnicodeString ucs = new UnicodeString( unicodeStringData,
+            (short)
unicodeStringData.length,UnicodeString.CHARCOUNTSIZE_DOUBLEBYTE );
         unfinishedString = unfinishedString + ucs.getString();
         setContinuationExpectedChars( getContinuationExpectedChars() -
calculateCharCount( dataLengthInBytes ) );
     }
Index: jakarta-poi/src/java/org/apache/poi/hssf/record/SeriesTextRecord.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hssf/record/SeriesTextRecord.java,v
retrieving revision 1.2
diff -u -r1.2 SeriesTextRecord.java
--- jakarta-poi/src/java/org/apache/poi/hssf/record/SeriesTextRecord.java	19 May 2002 15:55:12 -0000	1.2
+++ jakarta-poi/src/java/org/apache/poi/hssf/record/SeriesTextRecord.java	19 Jul 2002 07:00:25 -0000
@@ -169,7 +169,7 @@
         LittleEndian.putShort(data, 4 + offset, field_1_id);
         data[ 6 + offset ] = field_2_textLength;
         data[ 7 + offset ] = field_3_undocumented;
-        StringUtil.putUncompressedUnicodeHigh(field_4_text, data, 8 + offset);
+        StringUtil.putUncompressedUnicode(field_4_text, data, 8 + offset);
 
         return getRecordSize();
     }
Index: jakarta-poi/src/java/org/apache/poi/hssf/record/UnicodeString.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hssf/record/UnicodeString.java,v
retrieving revision 1.5
diff -u -r1.5 UnicodeString.java
--- jakarta-poi/src/java/org/apache/poi/hssf/record/UnicodeString.java	26 Jun 2002 12:25:44 -0000	1.5
+++ jakarta-poi/src/java/org/apache/poi/hssf/record/UnicodeString.java	19 Jul 2002 07:00:28 -0000
@@ -56,6 +56,7 @@
 package org.apache.poi.hssf.record;
 
 import org.apache.poi.util.LittleEndian;
+import org.apache.poi.util.LittleEndianConsts;
 import org.apache.poi.util.StringUtil;
 
 /**
@@ -74,10 +75,17 @@
     extends Record
     implements Comparable
 {
+    /**  character count field size in byte */
+    static final short CHARCOUNTSIZE_SINGLEBYTE = LittleEndianConsts.BYTE_SIZE;
+    static final short CHARCOUNTSIZE_DOUBLEBYTE = LittleEndianConsts.SHORT_SIZE;
+
+
     public final static short sid = 0xFFF;
     private short             field_1_charCount;     // = 0;
     private byte              field_2_optionflags;   // = 0;
     private String            field_3_string;        // = null;
+    private byte              field_4_ccAdjust =1;   // charCount size adjustment
+                                                     // default to 1 i.e.
charCount as short
     private final int RICH_TEXT_BIT = 8;
     private final int EXT_BIT = 4;
 
@@ -85,7 +93,6 @@
     {
     }
 
-
     public int hashCode()
     {
         int stringHash = 0;
@@ -114,6 +121,7 @@
                 && field_3_string.equals(other.field_3_string));
     }
 
+
     /**
      * construct a unicode string record and fill its fields, ID is ignored
      * @param id - ignored
@@ -127,6 +135,50 @@
     }
 
     /**
+     * construct a unicode string record and fill its fields, ID is ignored
+     * @param data - the bytes of the string/fields
+     * @param size - size of the data
+     * @param ccSize - the size (in byte) of the charCount field
+     */
+
+    public UnicodeString(byte [] data, short size,  short ccSize )
+    {
+        super();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data, size);
+    }
+
+    /**
+     * construct a unicode string record and fill its fields, ID is ignored
+     * @param data - the bytes of the string/fields
+     * @param size - size of the data
+     * @param offset - offset of the record
+     * @param ccSize - the size (in byte) of the charCount field
+     */
+
+    public UnicodeString(byte [] data, short size, int offset , short ccSize )
+    {
+        super();
+        //UnicodeString us = new UnicodeString();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data, size,offset);
+
+    }
+    /**
+     * construct a unicode string from a string fragment + data with an
+     * additional offset for extended(Far East and Rich Text) info and
+     * the size (in byte) of the charCount field given as last parameter
+     */
+    public UnicodeString(byte [] data, short size, int offset ,int extOffset,
+                         short ccSize )
+    {
+        super();
+        //UnicodeString us = new UnicodeString();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data, size,offset,extOffset);
+    }
+
+    /**
      * construct a unicode string from a string fragment + data
      */
 
@@ -138,36 +190,67 @@
     }
 
     /**
-     * NO OP
+     * construct a unicode string from a string fragment + data with the size
+     * (in byte) of the charCount field given as last parameter
      */
 
-    protected void validateSid(short id)
+    public UnicodeString( byte [] data, short size, String prefix,
+                         short ccSize)
     {
+        super();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data,  size, 0);
+        field_3_string = prefix + field_3_string;
+        setCharCount();
+    }
+
+    /**
+     * construct a unicode string from a string fragment + data with the size
+     * (in byte) of the charCount field given as last parameter
+     */
+
+    public UnicodeString( byte [] data, short size, int offset, String prefix,
+                         short ccSize)
+    {
+        super();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data,  size, offset);
+        field_3_string = prefix + field_3_string;
+        setCharCount();
 
-        // included only for interface compliance
     }
 
-    protected void fillFields(byte [] data, short size)
+    /**
+     * construct a unicode string from a string fragment + data with an
+     * additional offset for extended(Far East and Rich Text) info and the size
+     * (in byte) of the charCount field given as last parameter
+     */
+
+    public UnicodeString( byte [] data, short size, int offset, int extOffset,
+                           String prefix, short ccSize)
+    {
+/*
+        super();
+        field_4_ccAdjust =  (byte)((ccSize == (short) 2)?1:0) ;
+        fillFields(data,  size, offset);
+        field_3_string = prefix + field_3_string;
+        setCharCount();
+*/
+
+    }
+
+
+    /**
+     * NO OP
+     */
+
+    protected void validateSid(short id)
     {
-        field_1_charCount   = LittleEndian.getShort(data, 0);
-        field_2_optionflags = data[ 2 ];
-        if ((field_2_optionflags & 1) == 0)
-        {
-            field_3_string = new String(data, 3, getCharCount());
-        }
-        else
-        {
-            char[] array = new char[ getCharCount() ];
 
-            for (int j = 0; j < array.length; j++)
-            {
-                array[ j ] = ( char ) LittleEndian.getShort(data,
-                                                            3 + (j * 2));
-            }
-            field_3_string = new String(array);
-        }
+        // included only for interface compliance
     }
 
+
     /**
      * get the number of characters in the string
      *
@@ -286,6 +369,7 @@
             .append(Integer.toHexString(getOptionFlags())).append("\n");
         buffer.append("    .string          = ").append(getString())
             .append("\n");
+        buffer.append("    .extOffset       =
").append(field_4_ccAdjust).append("\n");
         buffer.append("[/UNICODESTRING]\n");
         return buffer.toString();
     }
@@ -300,8 +384,14 @@
         }
 
         // byte[] retval = new byte[ 3 + (getString().length() * charsize)];
-        LittleEndian.putShort(data, 0 + offset, getCharCount());
-        data[ 2 + offset ] = getOptionFlags();
+        if ( field_4_ccAdjust == 1) {
+          LittleEndian.putShort(data, 0 + offset, getCharCount());
+        }
+        else {
+          data[ offset]= (byte)getCharCount();
+        }
+
+        data[ 1+ field_4_ccAdjust+ offset ] = getOptionFlags();
 
 //        System.out.println("Unicode: We've got "+retval[2]+" for our option
flag");
         try {
@@ -309,25 +399,25 @@
 String(getString().getBytes("Unicode"),"Unicode");
             if (getOptionFlags() == 0)
             {
-                StringUtil.putCompressedUnicode(unicodeString, data, 0x3 +
-offset);
+                StringUtil.putCompressedUnicode(unicodeString, data,
+                  0x2 + field_4_ccAdjust + offset);
             }
             else
             {
                 StringUtil.putUncompressedUnicode(unicodeString, data,
-                                                    0x3 + offset);
+                  0x2 + field_4_ccAdjust + offset);
             }
         }
         catch (Exception e) {
             if (getOptionFlags() == 0)
             {
-                StringUtil.putCompressedUnicode(getString(), data, 0x3 +
-                                                offset);
+                StringUtil.putCompressedUnicode(getString(), data,
+                  0x2 + field_4_ccAdjust + offset);
             }
             else
             {
                 StringUtil.putUncompressedUnicode(getString(), data,
-                                                  0x3 + offset);
+                  0x2 +field_4_ccAdjust + offset);
             }
         }
         return getRecordSize();
@@ -341,7 +431,8 @@
     public int getRecordSize()
     {
         int charsize = isUncompressedUnicode() ? 2 : 1;
-        return 3 + (getString().length() * charsize);
+        return  2 + field_4_ccAdjust +
+        (getString().length() * charsize);
     }
 
     public short getSid()
@@ -349,6 +440,11 @@
         return this.sid;
     }
 
+    protected void fillFields(byte [] data, short size)
+    {
+        fillFields(data,  size, 0);
+    }
+
     /**
      * called by the constructor, should set class level fields.  Should throw
      * runtime exception for bad/icomplete data.
@@ -360,6 +456,85 @@
 
     protected void fillFields(byte [] data, short size, int offset)
     {
+      fillFields(data, size,  offset,0);
+/*
+        if ( field_4_ccAdjust == 1) {// CharCount field is 2 bytes
+          field_1_charCount   = LittleEndian.getShort(data, offset);
+        }
+        else {
+          field_1_charCount   = (short)LittleEndian.getUnsignedByte(data, offset);
+        }
+
+        field_2_optionflags = data[ offset+ 1+field_4_ccAdjust ];
+        if ((field_2_optionflags & 1) == 0)
+        {
+            field_3_string = new String(data,offset + 2+field_4_ccAdjust,
getCharCount());
+        }
+        else
+        {
+            char[] array = new char[ getCharCount() ];
+
+            for (int j = 0; j < array.length; j++)
+            {
+                array[ j ] = ( char ) LittleEndian.getShort(data,offset+
+                                      2+field_4_ccAdjust + (j * 2));
+            }
+            field_3_string = new String(array);
+        }
+*/
+    }
+
+    /**
+     * called by the constructor, should set class level fields.  Should throw
+     * runtime exception for bad/icomplete data.
+     *
+     * @param data raw data
+     * @param size size of data
+     * @param offset of the records data (provided a big array of the file)
+     * @param extOffset skip additional offset after header and before string start
+     */
+
+    protected void fillFields(byte [] data, short size, int offset,int extOffset)
+    {
+
+        field_2_optionflags = data[ offset+ 1+field_4_ccAdjust ];
+        short charByteSize = (short)(size - stringHeaderOverhead());
+
+        if ( field_4_ccAdjust == 1) {// CharCount field is 2 bytes
+          field_1_charCount   = LittleEndian.getShort(data, offset);
+        }
+        else {
+          field_1_charCount   = (short)LittleEndian.getUnsignedByte(data, offset);
+        }
+
+        if ((field_2_optionflags & 1) == 0) // 8bit char
+        {
+          if ( field_1_charCount  > charByteSize) { // string in this record is
broken over continuation
+            field_1_charCount  = charByteSize;
+          }
+          try {
+            field_3_string = new String(data,offset + 2 + field_4_ccAdjust +
+                              extOffset , getCharCount(),"ISO-8859-1");
+          }
+          catch (Exception e){
+            e.printStackTrace();
+          }
+
+        }
+        else // 16bit char
+        {
+            if ( field_1_charCount  >( charByteSize / 2)) { // string in this
record is broken over continuation
+              field_1_charCount  = (short) (charByteSize / 2) ;
+            }
+            char[] array = new char[ getCharCount() ];
+
+            for (int j = 0; j < array.length; j++)
+            {
+                array[ j ] = ( char ) LittleEndian.getShort(data,offset+
+                                      2+field_4_ccAdjust + extOffset  + (j * 2));
+            }
+            field_3_string = new String(array);
+        }
     }
 
     public int compareTo(Object obj)
@@ -380,13 +555,13 @@
 
         if (isUncompressedUnicode())
         {
-            int proposedStringLength = proposedBrokenLength - 3;
+            int proposedStringLength = proposedBrokenLength - (2 +
field_4_ccAdjust);
 
             if ((proposedStringLength % 2) == 1)
             {
                 proposedStringLength--;
             }
-            rval = proposedStringLength + 3;
+            rval = proposedStringLength + 2 + field_4_ccAdjust;
         }
         return rval;
     }
@@ -394,6 +569,15 @@
     public boolean isExtendedText()
     {
         return (getOptionFlags() & EXT_BIT) != 0;
+    }
+
+    private short stringHeaderOverhead()
+    {
+        return  (short)(2 + field_4_ccAdjust +
+                 + ( ((field_2_optionflags & RICH_TEXT_BIT) != 0)?
+                     LittleEndianConsts.SHORT_SIZE : 0 )
+                 + (  ((field_2_optionflags & EXT_BIT) != 0)?
+                     LittleEndianConsts.INT_SIZE : 0 )  );
     }
 
 }
Index: jakarta-poi/src/java/org/apache/poi/util/StringUtil.java
===================================================================
RCS file: /home/cvspublic/jakarta-poi/src/java/org/apache/poi/util/StringUtil.java,v
retrieving revision 1.2
diff -u -r1.2 StringUtil.java
--- jakarta-poi/src/java/org/apache/poi/util/StringUtil.java	19 May 2002 17:54:07 -0000	1.2
+++ jakarta-poi/src/java/org/apache/poi/util/StringUtil.java	19 Jul 2002 07:00:28 -0000
@@ -74,7 +74,7 @@
      */
     private StringUtil() { }
 
-    
+
     /**
      *  given a byte array of 16-bit unicode characters, compress to 8-bit and
      *  return a string
@@ -113,8 +113,8 @@
         }
         return new String(bstring);
     }
-    
-    
+
+
 
     /**
      *  given a byte array of 16-bit unicode characters, compress to 8-bit and
@@ -227,12 +227,12 @@
             char c = input.charAt(k);
 
             output[offset + (2 * k)] = (byte) (c >> 8);
-            output[offset + (2 * k)] = (byte) c;
+            output[offset + (2 * k)+1] = (byte) c;
         }
     }
-    
-    
-    
+
+
+
 
     /**
      *  Description of the Method
Index:
jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSSTRecordSizeCalculator.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSSTRecordSizeCalculator.java,v
retrieving revision 1.3
diff -u -r1.3 TestSSTRecordSizeCalculator.java
--- jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSSTRecordSizeCalculator.java	17 Jul 2002 14:18:03 -0000	1.3
+++ jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSSTRecordSizeCalculator.java	19 Jul 2002 07:00:30 -0000
@@ -186,7 +186,10 @@
         int offset = LittleEndianConsts.SHORT_SIZE;
         unicodeStringBuffer[offset++] = 0;
         System.arraycopy( s.getBytes(), 0, unicodeStringBuffer, offset,
s.length() );
-        return new UnicodeString( UnicodeString.sid, (short)
unicodeStringBuffer.length, unicodeStringBuffer );
+//        return new UnicodeString( UnicodeString.sid, (short)
unicodeStringBuffer.length, unicodeStringBuffer );
+        return new UnicodeString(  unicodeStringBuffer,(short)
unicodeStringBuffer.length,
+                    UnicodeString.CHARCOUNTSIZE_DOUBLEBYTE     );
+
     }
 
 }
Index:
jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSeriesTextRecord.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSeriesTextRecord.java,v
retrieving revision 1.1
diff -u -r1.1 TestSeriesTextRecord.java
--- jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSeriesTextRecord.java	18 May 2002 15:56:21 -0000	1.1
+++ jakarta-poi/src/testcases/org/apache/poi/hssf/record/TestSeriesTextRecord.java	19 Jul 2002 07:00:30 -0000
@@ -83,7 +83,7 @@
             throws Exception
     {
         SeriesTextRecord record = new SeriesTextRecord((short)0x100d,
(short)data.length, data);
-        
+
 
         assertEquals( (short)0, record.getId());
 
@@ -118,5 +118,15 @@
         assertEquals(recordBytes.length - 4, data.length);
         for (int i = 0; i < data.length; i++)
             assertEquals("At offset " + i, data[i], recordBytes[i+4]);
+    }
+    public static void main(String[] args)
+        throws Exception
+    {
+      TestSeriesTextRecord t = new TestSeriesTextRecord ("test instance");
+      //t.setUp();
+      //t._test_file_path =
+      //"J:\\poi-cvs\\jakarta-poi\\src\\testcases\\org\\apache\\poi\\HSSF\\data";
+      t.testLoad();
+      t.testStore();
     }
 }
Index: jakarta-poi/src/testcases/org/apache/poi/util/TestStringUtil.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/testcases/org/apache/poi/util/TestStringUtil.java,v
retrieving revision 1.2
diff -u -r1.2 TestStringUtil.java
--- jakarta-poi/src/testcases/org/apache/poi/util/TestStringUtil.java	11 Feb 2002 04:23:11 -0000	1.2
+++ jakarta-poi/src/testcases/org/apache/poi/util/TestStringUtil.java	19 Jul 2002 07:00:31 -0000
@@ -103,7 +103,7 @@
      * Test more complex form of getFromUnicode
      */
 
-    public void testComplexGetFromUnicode()
+  public void testComplexGetFromUnicode()
     {
         byte[] test_data = new byte[ 32 ];
         int    index     = 0;
@@ -172,8 +172,13 @@
             ( byte ) 'o', ( byte ) ' ', ( byte ) 'W', ( byte ) 'o',
             ( byte ) 'r', ( byte ) 'l', ( byte ) 'd', ( byte ) 0xAE
         };
-        String input           = new String(expected_output);
-
+        String input =new String("");
+        try {
+          input           = new String(expected_output,"ISO-8859-1");
+        }
+        catch (Exception e){
+            e.printStackTrace();
+        }
         StringUtil.putCompressedUnicode(input, output, 0);
         for (int j = 0; j < expected_output.length; j++)
         {
Comment 1 Andy Oliver 2002-07-21 03:23:07 UTC
Please resubmit this using "create new attachment"  I can't apply patches that
are pasted in the file because they wrap.

read last paragraph: http://jakarta.apache.org/poi/getinvolved/index.html

Thanks,

Andy
Comment 2 Andy Oliver 2002-07-21 03:23:44 UTC
(see previous comments - reopen when submitteD)
Comment 3 SioLam Patrick Lee 2002-07-22 04:29:34 UTC
Created attachment 2431 [details]
Attachment of  Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class
Comment 4 Andy Oliver 2002-07-25 12:23:50 UTC
Sorry I didn't notice it.  You didn't reopen the bug.  (I may sound silly or 
pedantic, I'm just busy, I scan the list of bugs for [PATCH] and open status
when reviewing).  I'll review this shortly.  Right now, I have to go to work.
Comment 5 Andy Oliver 2002-07-27 01:24:38 UTC
Hi.  I tried to apply this but the unit tests were failing so I'm going to look
at it later along with that bug.
Comment 6 SioLam Patrick Lee 2002-07-27 10:34:32 UTC
Could it be due to the newly commited changed to the TestBoundSheetRecord.java

glens       2002/07/26 18:45:44

  Added:       src/testcases/org/apache/poi/hssf/record
                        TestBoundSheetRecord.java
  Log:
  Test case for bound sheet record... it seems okay.
  
  Revision  Changes    Path
  1.1                  jakarta-
poi/src/testcases/org/apache/poi/hssf/record/TestBoundSheetRecord.java

Thanks for looking at it

Patrick Lee
Comment 7 Andy Oliver 2002-07-28 23:24:16 UTC
Class org.apache.poi.hssf.record.TestBoundSheetRecord
Name Tests Errors Failures Time(s)
TestBoundSheetRecord 2 0 1 0.581
Tests

Name Status Type Time(s)
testRecordLength Success 0.020
testWideRecordLength Failure 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24>
but was:<18>

junit.framework.AssertionFailedError: 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2
expected:<24> but was:<18>
at
org.apache.poi.hssf.record.TestBoundSheetRecord.testWideRecordLength(TestBoundSheetRecord.java:93)
0.020


Fix this and I'll apply it.
Comment 8 SioLam Patrick Lee 2002-07-29 08:59:38 UTC
Hi Andy, Sergei and all 

I have checked the following unit test failure: 

Class org.apache.poi.hssf.record.TestBoundSheetRecord
Name Tests Errors Failures Time(s)
TestBoundSheetRecord 2 0 1 0.581
Tests

Name Status Type Time(s)
testRecordLength Success 0.020
testWideRecordLength Failure 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24>
but was:<18>

junit.framework.AssertionFailedError: 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2
expected:<24> but was:<18>
at
org.apache.poi.hssf.record.TestBoundSheetRecord.testWideRecordLength(TestBoundSheetRecord.java:93)
0.020

The test is as follow: 

    public void testWideRecordLength()
            throws Exception
    {
        BoundSheetRecord record = new BoundSheetRecord();
        record.setCompressedUnicodeFlag((byte)0x01);
        record.setSheetname("Sheet1");
        record.setSheetnameLength((byte)6);

        assertEquals(" 2  +  2  +  4  +   2   +    1     +    1    + len(str) *
2", 24, 

record.getRecordSize());
    }

The setSheetname of BoundSheetRecord is as follow:

    public void setSheetname(String sheetname)
    {
        boolean is16bit = is16bitString(sheetname);
        setSheetnameLength((byte) sheetname.length() );
        setCompressedUnicodeFlag((byte ) (is16bit?1:0));
        field_5_sheetname = sheetname;
    }

The unit test failed because the setSheetname use autodetection in deciding
whether the sheetname can be represented in excel compressed unicode format.  If
yes, than it set the CompressedUnicodeFlag to 0 i.e. 8 bit representation.  It
simply ignore the following statement in testWideRecordLength():
        record.setCompressedUnicodeFlag((byte)0x01);

When I submitted the patch, the test was not there yet.  So now, the question is
whether we want autodetection in the setSheetname( and later, in the cell
setCellValue).  The following are some pro and cons :

It seems for the current situation,  the programmer (the user of the POI API)has
to do more work by setting 
        record.setCompressedUnicodeFlag((byte)0x01);
for sheetname or 
        cell.setEncoding(org.apache.poi.hssf.usermodel.HSSFCell.ENCODING_UTF_16);
for cell value.

And in case the programmer make a inconsistence mistake, ex. set for 
   
cell.setEncoding(org.apache.poi.hssf.usermodel.HSSFCell.ENCODING_COMPRESSED_UNICODE
);
and then
	setCellValue("\u0422\u0435\u0441\u0442\u043E\u0432\u0430\u044F");
What exception should be thrown?

If autodetection is used, the programmer code less, and consistence checking is
avoided.

Please conside each alternative and I will change the code accordingly 

Patrick Lee
Comment 9 Andy Oliver 2002-07-29 11:48:40 UTC
I think we decided against autodetection because it doesn't work with russian
and other langauges anyhow.  The default should be 8-bit and using 16-bit should
be optional.  
Comment 10 SioLam Patrick Lee 2002-07-29 16:31:16 UTC
Hi Andy, Sergei and all 

  Sergei, I haven't heard from you since I send you back your modified test 
program about setting a russian sheetname.  Do you have any success with it?

Patrick Lee
Comment 11 Sergei Kozello 2002-07-29 19:23:20 UTC
I have a succes and wrote it back to you vai e-mail, but juxt forgot, that it 
succeded on my changes. %)

And have took a look on my changes. As I see it is a very good idea to put the 
code of using in the Records UnicodeString class.

Let's merge our work on the unicode getting and putting we have into the 
UnicodeString. 
Could you see at the code at BoundSheetRecord, when I have refined you step: 
using only StringUtil without SSTSerializer?

What do you think about putting it into the UnicodeSrting, so we can benefit on 
using this class and instantiating it in any record we need Unicode?

Comment 12 SioLam Patrick Lee 2002-07-30 04:03:30 UTC
Hi Andy, Sergei and all 

>I have a succes and wrote it back to you vai e-mail, but juxt forgot, that it 
>succeded on my changes. %)


>And have took a look on my changes. As I see it is a very good idea to put the 
>code of using in the Records UnicodeString class.

Great! can you send me a copy of the final code that works for you.  

>Let's merge our work on the unicode getting and putting we have into the 
>UnicodeString. 

Sure. Let fix this bug together.  I have some planning on futher
refactor/complete the UnicodeString 

class.  Lets collaborate on this issue.

>Could you see at the code at BoundSheetRecord, when I have refined you step: 
>using only StringUtil without SSTSerializer?
Please check 
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10976
for [Patch] Unicode Support for sheetname , refactor SSTDeserializer &
UnicodeString class
as it refactor code regarding BIFF8 format from SSTDeserializer to UnicodeString
class.  What do you 

think of this idea.

>What do you think about putting it into the UnicodeSrting, so we can benefit on 
>using this class and instantiating it in any record we need Unicode?

Do you mean code described in?
http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=2422

I am really interesting in working on this together.
Thanks 

Patrick Lee
Comment 13 Andy Oliver 2002-08-15 14:16:29 UTC
What is the status of this work?  I'd like to get this in ASAP.
Comment 14 Sergei Kozello 2002-08-15 18:37:50 UTC
I was away from online for a last week. :(

The work on this bug is depends on bug #11010. As I have wrote, I think there 
is no bug in the patch you have rolled back.
Please, look at the last posts on this bug.

About more Unicode support: while offline I have done unicode support for 
FormatRecord.

Also I have some questions on NameRecord. I have made the implemetation for it 
for Page Titles, but I am not sure in the way I have done it.

Please look at the bug #11010, because the Unicode changes depend on it.
Comment 15 Andy Oliver 2002-08-15 19:57:30 UTC
sorry I missed the copy of the factory file...  (*nudge* *nudge* probably
because it wasn't a patch).. . 

Looks like you nailed it. Thanks!
Comment 16 Andy Oliver 2002-08-15 20:15:41 UTC
Okay I had more time to review this.  I like the idea but I hate the
implementation.  (no offense).  My objection:  EVERY method that says "lifted
from..." ...  Lets fight this code rot right now.  Move those to common utility
functions (perhaps StringUtility for instance) and remove them from any classes
they originate from (SSTRecord for instance).  Cutting and pasting is not a good
thing.  I don't want to fix string functions everywhere, I want one authorative
routine for EACH way excel handles them.  Do that and comment it liberally and
I'll reapply it..

the "arraycopy" function seems silly to me since it just calls to
System.arraycopy... . . why do we have that?  

Once you've fixed it, please reattach and reopen this bug.... 

Thanks,

Andy