Bug 18846 - [PATCH][RFE]Refactor the transformation between byte array and String object
Summary: [PATCH][RFE]Refactor the transformation between byte array and String object
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.0-dev
Hardware: All All
: P5 minor (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-04-09 09:56 UTC by Toshiaki Kamoshida
Modified: 2004-11-16 19:05 UTC (History)
1 user (show)



Attachments
Sample to show that System Encoder/Decoder works like StringUtil want to do.Please execute main and see console. (2.82 KB, text/plain)
2003-04-10 09:18 UTC, Toshiaki Kamoshida
Details
PATCH for this matter. test passed in my env. (39.37 KB, patch)
2003-05-08 12:18 UTC, Toshiaki Kamoshida
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Toshiaki Kamoshida 2003-04-09 09:56:05 UTC
Because there is a danger of mixing any mistakes or problems or confuses about 
the transformation between byte array and String object(because of platform and 
locale and any bugs:D),I request to refactor the operations in sources about it.

The rest is only an idea,maybe there is better one,please discuss about it :).

1.Refactor the class org.apache.poi.util.StringUtil like;

public static String getFromUnicodeHigh(final byte[] string,
@final int offset, final int len){
  try{
    return new String(string,offset,len*2,"UTF-16LE");
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}

public static String getFromUnicode(final byte[] string,
@final int offset, final int len){
  try{
    return new String(string,offset,len*2,"UTF-16BE");
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}

//add new method
public static String getFromCompressedUnicode(final byte[] string,
@final int offset, final int len){
  try{
    return new String(string,offset,len,"ISO-8859-1");
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}

public static void putCompressedUnicode(final String input,
  final byte[] output,final int offset){
  try{
    byte[] bytes = input.getBytes("ISO-8859-1");
    System.arraycopy(output,offset,bytes,0,bytes.length);
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}

public static void putUncompressedUnicode(final String input,
  final byte[] output,final int offset){
  try{
    byte[] bytes = input.getBytes("UTF-16LE");
    System.arraycopy(output,offset,bytes,0,bytes.length);
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}
public static void putUncompressedUnicodeHigh(final String input,
  final byte[] output,final int offset){
  try{
    byte[] bytes = input.getBytes("UTF-16BE");
    System.arraycopy(output,offset,bytes,0,bytes.length);
  }
  catch(UnsupportedEncodingException e){
    throw new InternalError();/*unreachable*/
  }
}

2.Replace all phrases "new String(args)"
to "StringUtil.getFromCompressedUnicode(args)"
Comment 1 Toshiaki Kamoshida 2003-04-10 09:18:05 UTC
Created attachment 5765 [details]
Sample to show that System Encoder/Decoder works like StringUtil want to do.Please execute main and see console.
Comment 2 Toshiaki Kamoshida 2003-04-11 05:40:54 UTC
We need a 6 way to transform from/to byte array to/from String Object in POI.

1. Encode char seq 16Bit Uncompressed Unicode 
-> byte array with 16Bit Unicode on LittleEndian(like "UTF-16LE"builtin Encoder)
2. Encode char seq 16Bit Uncompressed Unicode 
-> byte array with 16Bit Unicode on BigEndian(like "UTF-16BE"builtin Encoder)
3. Encode char seq 8Bit Compressed Unicode
-> byte array with 8Bit Compressed Unicode(like "ISO-8859-1"builtin Encoder)
4. Decode byte array with 16Bit Unicode on LittleEndian
->char seq 16Bit Uncomressed Unicode(like "UTF-16LE"builtin Decoder)
5. Decode byte array with 16Bit Unicode on BigEndian
->char seq 16Bit Uncomressed Unicode(like "UTF-16BE"builtin Decoder)
6. Decode byte array with 8Bit Compressed Unicode
->char seq 8Bit Comressed Unicode(like "ISO-8859-1"builtin Decoder)

And now,in POI,there are doing with the ways like;

1.call StringUtil#getFromUnicodeHigh()
2.perhaps call StringUtil#getFromUnicode()
3.call constractor of java.lang.String() with no Decode Indicator
4.call StringUtil#putUncompressedUnicode()
5.perhaps call StringUtil#putUncompressedUnicodeHigh()
6.call StringUtil#putCompressedUnicode()

So,It seems for me,there is some confusion and problems.

(1)StringUtil#getFromUnicode()&#putUncompressedUnicodeHigh() has bugs,so these 
are always not works as you expect(see Bug 18837)
(2)To do 1 you should use StringUtil#getFromUnicodeHigh() but to do 4 you 
should use putUncompressedUnicode().the name of metods shows some confuse.
(3)The way to do 3 is isolated from others.
(4)The functions of thease method is already existed at J2SE API(Constractor of 
java.lang.String).The implementations of these methods may be "Reinvention of 
wheel".

I feel,to refine thease places may makes your burden a little lighter.
But the priority of this matter is low,because the change works no effect at 
runtime,but only refine the landscape of sources...
I feel and afraid,this may be one of the "broken window" in POI.
Comment 3 Andy Oliver 2003-04-11 12:13:35 UTC
I totally support refactoring this.  It looks like you know the most about it
and have the know-how to do it.  Send patches...They will be applied!  
Comment 4 Toshiaki Kamoshida 2003-04-14 03:04:17 UTC
After fixing Bug 18947,I'll submit a patch for implementations in StringUtil.

...I could only 4 testcases to find the problems about Bug 18947,please help me!
I found the easy way to change "Default Encoding" at runtime on Sun JRE(see the 
submitted testcase at Bug 18947),so perhaps the way help your works to find and 
solve any "Problems on Encoding" after now...

And,if you can,please change the names of method XXXUnicode and XXXUnicodeHigh.
because of section(2).

Renaming methods is a easy work over any IDE(I can do it easily on Eclipse JDT).
But if I do so,many points will cange by the work,so it's difficurt to submit 
as a patch.And I feel,I shouldn't give new names at these methods,but you do:D
Comment 5 Andy Oliver 2003-04-14 12:21:41 UTC
I disagree.  I think submitting patches would be the perfect way to fix the
problem and that you are the perfect person to rename methods.
Comment 6 Toshiaki Kamoshida 2003-04-21 11:34:11 UTC
OK,I'll make a PATCH.

I'll rename these methods to;
#getFromCompressedUnicode()
#getFromUnicodeLE()
#getFromUnicodeBE()
#putCompressedUnicode()
#putUnicodeLE()
#putUnicodeBE()

If there are opinions against it,please tell me your objections:D
Comment 7 Toshiaki Kamoshida 2003-05-08 12:18:52 UTC
Created attachment 6270 [details]
PATCH for this matter. test passed in my env.
Comment 8 Toshiaki Kamoshida 2003-05-08 12:22:26 UTC
I submitted a PATCH.
Because it is only a refactor,so there is no any new testcases.
Plz check and eval it.

Line feeds in StringUtil has broken now,so I fixed it with this works.
Comment 9 Toshiaki Kamoshida 2003-05-20 05:26:16 UTC
BTW,there is some calling get/setUnicodeBE() in hssf,but there may be no place 
to store the Unicode-character-sequences with BigEndian in XLS format...
I guess the places may contain some problems without showing as any bugs with 
testcases now exist.
But I didn't know what the places do.I made the PATCH only by replacing with 
the "REFACTOR" command of "Eclipse JDT":D
Comment 10 Andy Oliver 2003-07-24 16:11:41 UTC
this patch looks funny...  We should work on this though.  Maybe Tetsuya can help.
Comment 11 Tetsuya Kitahata 2003-07-24 17:02:34 UTC
Fixed Weird LineFeed. Toshiaki, please supply the "cvs diff -u" style patch 
again for other parts.