Bug 55730

Summary: Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c
Product: POI Reporter: Eric Peters <eric>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P2    
Version: 3.10-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c
Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c w/test fixed

Description Eric Peters 2013-10-31 22:40:44 UTC
Created attachment 30987 [details]
Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c

The documentation is both correct and wrong. "*" is a special character to indicate a repeating character to fit a cell: http://office.microsoft.com/en-us/excel-help/number-format-codes-HP005198679.aspx

At some point in time the documentation removed the <space> character in the *<space> format strings, but it also swapped the 0x2b with the 0x2c format strings in the implementation (Such as an Accounting format)

The org.apache.poi.ss.usermodel.DataFormatter is still bugged when supporting the correct 0x2b, but this at least should get fixed.

I tried using the patch guidelines at: http://poi.apache.org/guidelines.html but the ant command didn't work, and since I checked everything out with git, that didn't work either - so resulted in just using SourceTree to generate a patch file.  Either way the patch isn't too complicated and could be easily applied.
Comment 1 Eric Peters 2013-10-31 22:50:13 UTC
Sorry, I forgot to include the fixes to the MissingBits.xls / etc...hmm will see if I can figure out howto do a legit patch.
Comment 2 Eric Peters 2013-10-31 22:58:57 UTC
Comment on attachment 30987 [details]
Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c

From 3bba25eb4fb67eff700b05a0e11c9a52ef39d6bd Mon Sep 17 00:00:00 2001
From: Eric Peters <eric@peters.org>
Date: Thu, 31 Oct 2013 15:38:56 -0700
Subject: [PATCH] HSSP Formats

---
 src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java    | 13 +++++++------
 .../hssf/eventusermodel/TestFormatTrackingHSSFListener.java |  3 ++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java b/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java
index 5fe6e13..2dad6f8 100644
--- a/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java
+++ b/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java
@@ -54,10 +54,10 @@ import java.util.Map;
  *       0x26, "#,##0_);[Red](#,##0)"<br/>
  *       0x27, "#,##0.00_);(#,##0.00)"<br/>
  *       0x28, "#,##0.00_);[Red](#,##0.00)"<br/>
- *       0x29, "_(*#,##0_);_(*(#,##0);_(* \"-\"_);_(@_)"<br/>
- *       0x2a, "_($*#,##0_);_($*(#,##0);_($* \"-\"_);_(@_)"<br/>
- *       0x2b, "_(*#,##0.00_);_(*(#,##0.00);_(*\"-\"??_);_(@_)"<br/>
- *       0x2c, "_($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_)"<br/>
+ *       0x29, "_(* #,##0_);_(* (#,##0);_(* \"-\"_);_(@_)"<br/>
+ *       0x2a, "_($* #,##0_);_($* (#,##0);_($* \"-\"_);_(@_)"<br/>
+ *       0x2b, "_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)"<br/>
+ *       0x2c, "_($* #,##0.00_);_($* (#,##0.00);_($* \"-\"??_);_(@_)"<br/>
  *       0x2d, "mm:ss"<br/>
  *       0x2e, "[h]:mm:ss"<br/>
  *       0x2f, "mm:ss.0"<br/>
@@ -69,6 +69,7 @@ import java.util.Map;
  * @author Yegor Kozlov
  *
  * Modified 6/17/09 by Stanislav Shor - positive formats don't need starting '('
+ * Modified 10/31/13 by Eric Peters - * is a repeating/padding character directive, examples needed a space after the asterix (e.i. Accounting format)
  *
  */
 public final class BuiltinFormats {
@@ -137,8 +138,8 @@ public final class BuiltinFormats {
 		putFormat(m, 0x28, "#,##0.00_);[Red](#,##0.00)");
 		putFormat(m, 0x29, "_(\"$\"* #,##0_);_(\"$\"* (#,##0);_(\"$\"* \"-\"_);_(@_)");
 		putFormat(m, 0x2a, "_(* #,##0_);_(* (#,##0);_(* \"-\"_);_(@_)");
-		putFormat(m, 0x2b, "_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)");
-		putFormat(m, 0x2c, "_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)");
+		putFormat(m, 0x2b, "_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)");
+		putFormat(m, 0x2c, "_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)");
 		putFormat(m, 0x2d, "mm:ss");
 		putFormat(m, 0x2e, "[h]:mm:ss");
 		putFormat(m, 0x2f, "mm:ss.0");
diff --git a/src/testcases/org/apache/poi/hssf/eventusermodel/TestFormatTrackingHSSFListener.java b/src/testcases/org/apache/poi/hssf/eventusermodel/TestFormatTrackingHSSFListener.java
index f3bc27b..ad144a2 100644
--- a/src/testcases/org/apache/poi/hssf/eventusermodel/TestFormatTrackingHSSFListener.java
+++ b/src/testcases/org/apache/poi/hssf/eventusermodel/TestFormatTrackingHSSFListener.java
@@ -57,7 +57,8 @@ public final class TestFormatTrackingHSSFListener extends TestCase {
 
 		assertEquals("_(\"$\"* #,##0_);_(\"$\"* (#,##0);_(\"$\"* \"-\"_);_(@_)", listener.getFormatString(41));
 		assertEquals("_(* #,##0_);_(* (#,##0);_(* \"-\"_);_(@_)", listener.getFormatString(42));
-		assertEquals("_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)", listener.getFormatString(43));
+		assertEquals("_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)", listener.getFormatString(43));
+		assertEquals("_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)", listener.getFormatString(44));
 	}
 	
 	/**
-- 
1.7.11.1
Comment 3 Eric Peters 2013-10-31 23:00:13 UTC
Created attachment 30988 [details]
Fix org.apache.poi.ss.usermodel.BuiltinFormats.java for 0x29-0x2c w/test fixed
Comment 4 Nick Burch 2013-10-31 23:12:44 UTC
I have a sneaking suspicion that at some point, some of the documentation had a mistake or two in it about the format lists. I wonder if we've either got the wrong one, or if the docs you've seen has the wrong one, or if that was actually something completely different I'm confused about...

Hopefully someone else can confirm!
Comment 5 Eric Peters 2013-10-31 23:41:40 UTC
@Nick I guess this might be more complicated, but the patch holds up.

I am creating the .xls files using Office 2010 in a Save As... environment, I hope that's not the issue.  It would seem to me the patch I have for BuiltinFormats.java is consistent with what's in the original javadoc.


THe XSSF Code seems to re-use the same BuiltinFormats via org.apache.poi.xssf.usermodel.XSSFCellStyle

    public String getFormat(short index) {
        String fmt = stylesSource.getNumberFormatAt(index);
        if(fmt == null) fmt = BuiltinFormats.getBuiltinFormat(index);
        return fmt;
    }

That said once I poked into the file I saw:
<numFmts count="4">
    <numFmt numFmtId="44" formatCode="_(&quot;$&quot;* #,##0.00_);_(&quot;$&quot;* \(#,##0.00\);_(&quot;$&quot;* &quot;-&quot;??_);_(@_)"/>
    <numFmt numFmtId="43" formatCode="_(* #,##0.00_);_(* \(#,##0.00\);_(* &quot;-&quot;??_);_(@_)"/>
    <numFmt numFmtId="171" formatCode="_-[$£-809]* #,##0.00_-;\-[$£-809]* #,##0.00_-;_-[$£-809]* &quot;-&quot;??_-;_-@_-"/>
    <numFmt numFmtId="172" formatCode="_ [$¥-804]* #,##0.00_ ;_ [$¥-804]* \-#,##0.00_ ;_ [$¥-804]* &quot;-&quot;??_ ;_ @_ "/>
</numFmts>

So it looks like Excel 2010 wrote the proper format code into the file directly, so XSSF skips BuiltinFormats in this case.

My Original Unit Test was:

HSSF:
  Vector(Vector($4.79, * 4.79)) did not equal Vector(Vector(4.79, $4.79))  (TestExcelFlatFileReaderCommon.scala:37)
XSSF:
  Vector(Vector(* 4.79, $4.79)) did not equal Vector(Vector(4.79, $4.79)) (TestExcelFlatFileReaderCommon.scala:37)

That makes a lot more sense now why one would show one thing, and the other something else.

The underlying problem here is the DataFormatter isn't supporting the Format Index: 43 properly with that *<space>, but that's a separate issue than correcting the HSSF.
Comment 6 Eric Peters 2013-11-01 00:20:02 UTC
@Nick - second part of the fix:

eric@Erics-MacBook-Pro:~/Work/poi$ git diff
diff --git a/src/java/org/apache/poi/ss/usermodel/DataFormatter.java b/src/java/org/apache/poi/ss/usermodel/DataFormatter.java
index 273dd89..eae651d 100644
--- a/src/java/org/apache/poi/ss/usermodel/DataFormatter.java
+++ b/src/java/org/apache/poi/ss/usermodel/DataFormatter.java
@@ -265,7 +265,8 @@ public class DataFormatter {
 //      int i = cellValue > 0.0 ? 0 : cellValue < 0.0 ? 1 : 2; 
 //      String formatStr = (i < formatBits.length) ? formatBits[i] : formatBits[0];
 
-        String formatStr = formatStrIn;
+        // Strip out repeating characters in the number format
+        String formatStr = formatStrIn.replaceAll("\\*.","");
         // Excel supports positive/negative/zero, but java
         // doesn't, so we need to do it specially
         final int firstAt = formatStr.indexOf(';');
diff --git a/src/testcases/org/apache/poi/ss/usermodel/TestDataFormatter.java b/src/testcases/org/apache/poi/ss/usermodel/TestDataFormatter.java
index 20825e5..d2a975e 100644
--- a/src/testcases/org/apache/poi/ss/usermodel/TestDataFormatter.java
+++ b/src/testcases/org/apache/poi/ss/usermodel/TestDataFormatter.java
@@ -271,7 +271,11 @@ public class TestDataFormatter extends TestCase {
        assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "##.##* "));
        assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "##.##*1"));
        assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "##.##*)"));
+       assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "* ##.##"));
        assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "*-##.##"));
+       assertEquals("12.34", dfUS.formatRawCellContents(12.343, -1, "_(* #,##0.00_)"));
+
+//"_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)"
     }
     
     /**
Comment 7 Eric Peters 2013-12-23 16:56:06 UTC
I finally found a more authoritive reference on this patch: http://support.microsoft.com/kb/147942 - lists the standard build int format strings:


0x2b      _(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(@_)
0x2c      _($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_)


1) you can see that the format strings do include a space after the repeating character asterixes

2) the 0x2b should not be the accounting $, and the 2x2c should be, exactly like the fix in my patch

-		putFormat(m, 0x2b, "_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)");
-		putFormat(m, 0x2c, "_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)");
+		putFormat(m, 0x2b, "_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_)");
+		putFormat(m, 0x2c, "_(\"$\"* #,##0.00_);_(\"$\"* (#,##0.00);_(\"$\"* \"-\"??_);_(@_)");
Comment 8 Yegor Kozlov 2013-12-24 05:59:40 UTC
Thanks for the patch, applied in r1553247

Regards,
Yegor
Comment 9 Bernhard Seebass 2014-04-23 13:29:43 UTC
Sorry, it's only half fixed. 0x29 and 0x2a are still wrong. Anyway this bug is duplicate of 54318

*** This bug has been marked as a duplicate of bug 54318 ***