Summary: | [Patch] Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class | ||
---|---|---|---|
Product: | POI | Reporter: | SioLam Patrick Lee <patrickl> |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED INVALID | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 2.0-dev | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
Attachments: | Attachment of Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class |
Description
SioLam Patrick Lee
2002-07-19 07:11:28 UTC
Please resubmit this using "create new attachment" I can't apply patches that are pasted in the file because they wrap. read last paragraph: http://jakarta.apache.org/poi/getinvolved/index.html Thanks, Andy (see previous comments - reopen when submitteD) Created attachment 2431 [details]
Attachment of Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class
Sorry I didn't notice it. You didn't reopen the bug. (I may sound silly or pedantic, I'm just busy, I scan the list of bugs for [PATCH] and open status when reviewing). I'll review this shortly. Right now, I have to go to work. Hi. I tried to apply this but the unit tests were failing so I'm going to look at it later along with that bug. Could it be due to the newly commited changed to the TestBoundSheetRecord.java glens 2002/07/26 18:45:44 Added: src/testcases/org/apache/poi/hssf/record TestBoundSheetRecord.java Log: Test case for bound sheet record... it seems okay. Revision Changes Path 1.1 jakarta- poi/src/testcases/org/apache/poi/hssf/record/TestBoundSheetRecord.java Thanks for looking at it Patrick Lee Class org.apache.poi.hssf.record.TestBoundSheetRecord Name Tests Errors Failures Time(s) TestBoundSheetRecord 2 0 1 0.581 Tests Name Status Type Time(s) testRecordLength Success 0.020 testWideRecordLength Failure 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24> but was:<18> junit.framework.AssertionFailedError: 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24> but was:<18> at org.apache.poi.hssf.record.TestBoundSheetRecord.testWideRecordLength(TestBoundSheetRecord.java:93) 0.020 Fix this and I'll apply it. Hi Andy, Sergei and all I have checked the following unit test failure: Class org.apache.poi.hssf.record.TestBoundSheetRecord Name Tests Errors Failures Time(s) TestBoundSheetRecord 2 0 1 0.581 Tests Name Status Type Time(s) testRecordLength Success 0.020 testWideRecordLength Failure 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24> but was:<18> junit.framework.AssertionFailedError: 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2 expected:<24> but was:<18> at org.apache.poi.hssf.record.TestBoundSheetRecord.testWideRecordLength(TestBoundSheetRecord.java:93) 0.020 The test is as follow: public void testWideRecordLength() throws Exception { BoundSheetRecord record = new BoundSheetRecord(); record.setCompressedUnicodeFlag((byte)0x01); record.setSheetname("Sheet1"); record.setSheetnameLength((byte)6); assertEquals(" 2 + 2 + 4 + 2 + 1 + 1 + len(str) * 2", 24, record.getRecordSize()); } The setSheetname of BoundSheetRecord is as follow: public void setSheetname(String sheetname) { boolean is16bit = is16bitString(sheetname); setSheetnameLength((byte) sheetname.length() ); setCompressedUnicodeFlag((byte ) (is16bit?1:0)); field_5_sheetname = sheetname; } The unit test failed because the setSheetname use autodetection in deciding whether the sheetname can be represented in excel compressed unicode format. If yes, than it set the CompressedUnicodeFlag to 0 i.e. 8 bit representation. It simply ignore the following statement in testWideRecordLength(): record.setCompressedUnicodeFlag((byte)0x01); When I submitted the patch, the test was not there yet. So now, the question is whether we want autodetection in the setSheetname( and later, in the cell setCellValue). The following are some pro and cons : It seems for the current situation, the programmer (the user of the POI API)has to do more work by setting record.setCompressedUnicodeFlag((byte)0x01); for sheetname or cell.setEncoding(org.apache.poi.hssf.usermodel.HSSFCell.ENCODING_UTF_16); for cell value. And in case the programmer make a inconsistence mistake, ex. set for cell.setEncoding(org.apache.poi.hssf.usermodel.HSSFCell.ENCODING_COMPRESSED_UNICODE ); and then setCellValue("\u0422\u0435\u0441\u0442\u043E\u0432\u0430\u044F"); What exception should be thrown? If autodetection is used, the programmer code less, and consistence checking is avoided. Please conside each alternative and I will change the code accordingly Patrick Lee I think we decided against autodetection because it doesn't work with russian and other langauges anyhow. The default should be 8-bit and using 16-bit should be optional. Hi Andy, Sergei and all Sergei, I haven't heard from you since I send you back your modified test program about setting a russian sheetname. Do you have any success with it? Patrick Lee I have a succes and wrote it back to you vai e-mail, but juxt forgot, that it succeded on my changes. %) And have took a look on my changes. As I see it is a very good idea to put the code of using in the Records UnicodeString class. Let's merge our work on the unicode getting and putting we have into the UnicodeString. Could you see at the code at BoundSheetRecord, when I have refined you step: using only StringUtil without SSTSerializer? What do you think about putting it into the UnicodeSrting, so we can benefit on using this class and instantiating it in any record we need Unicode? Hi Andy, Sergei and all >I have a succes and wrote it back to you vai e-mail, but juxt forgot, that it >succeded on my changes. %) >And have took a look on my changes. As I see it is a very good idea to put the >code of using in the Records UnicodeString class. Great! can you send me a copy of the final code that works for you. >Let's merge our work on the unicode getting and putting we have into the >UnicodeString. Sure. Let fix this bug together. I have some planning on futher refactor/complete the UnicodeString class. Lets collaborate on this issue. >Could you see at the code at BoundSheetRecord, when I have refined you step: >using only StringUtil without SSTSerializer? Please check http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10976 for [Patch] Unicode Support for sheetname , refactor SSTDeserializer & UnicodeString class as it refactor code regarding BIFF8 format from SSTDeserializer to UnicodeString class. What do you think of this idea. >What do you think about putting it into the UnicodeSrting, so we can benefit on >using this class and instantiating it in any record we need Unicode? Do you mean code described in? http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=2422 I am really interesting in working on this together. Thanks Patrick Lee What is the status of this work? I'd like to get this in ASAP. I was away from online for a last week. :( The work on this bug is depends on bug #11010. As I have wrote, I think there is no bug in the patch you have rolled back. Please, look at the last posts on this bug. About more Unicode support: while offline I have done unicode support for FormatRecord. Also I have some questions on NameRecord. I have made the implemetation for it for Page Titles, but I am not sure in the way I have done it. Please look at the bug #11010, because the Unicode changes depend on it. sorry I missed the copy of the factory file... (*nudge* *nudge* probably because it wasn't a patch).. . Looks like you nailed it. Thanks! Okay I had more time to review this. I like the idea but I hate the implementation. (no offense). My objection: EVERY method that says "lifted from..." ... Lets fight this code rot right now. Move those to common utility functions (perhaps StringUtility for instance) and remove them from any classes they originate from (SSTRecord for instance). Cutting and pasting is not a good thing. I don't want to fix string functions everywhere, I want one authorative routine for EACH way excel handles them. Do that and comment it liberally and I'll reapply it.. the "arraycopy" function seems silly to me since it just calls to System.arraycopy... . . why do we have that? Once you've fixed it, please reattach and reopen this bug.... Thanks, Andy |