Bug 48494

Summary: EventBasedExcelExtractor and ExcelExtractor don't process conditional cell formatting correctly
Product: POI Reporter: Phil Varner <philvarner>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.6-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.4   
Attachments: sheet that exhibits issue
runs EBEE and EE on xls

Description Phil Varner 2010-01-06 09:49:49 UTC
Created attachment 24803 [details]
sheet that exhibits issue

>> The EBEE tries to format decimal values before returning them using
>> the method formatNumberDateCell.  I have some xls (unfortunately I
>> can't share) that throw an exception from the DecimalFormat
>> constructor:
>> java.lang.IllegalArgumentException: Unquoted special character ';' in
>> pattern "_(*#,##0_);_(*(#,##0);_(* "-"_);_(@_)"


UserModel copes, but because no formatting is applied. Relevant code
in ExcelExtractor:

case HSSFCell.CELL_TYPE_NUMERIC:
 // Note - we don't apply any formatting!
 text.append(cell.getNumericCellValue());
 break;

However, a date formatted cell (string cell type) with
"hh:mm;hh:mm;hh:mm" fails in both.

I believe this means conditional cell formatting never works (either
not being applied or throwing exception), only static cell formatting.
 I'm not sure what the above format of the 3- or 4-valued conditional
is, though.

I'll file a bug for this. Attached is a doc that exhibits the issue
and a java class that reproduces.
Comment 1 Phil Varner 2010-01-06 09:50:55 UTC
Created attachment 24804 [details]
runs EBEE and EE on xls
Comment 2 Phil Varner 2010-03-05 06:06:13 UTC
Patch (the revision commit numbers may need removing/changing, this was to a private svn repo):

Index: src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java
===================================================================
--- src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java        (revision 99199)
+++ src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java        (revision 99174)
@@ -275,15 +275,8 @@
                        }
 
                        // Format as a number
-                       // some format strings contain unescaped ;s which cause
-                       // problems.  Since this is only for indexing, we
-                       // can ignore these and just return the unformatted string (hacky?)
-                       try {
-                               DecimalFormat df = new DecimalFormat(formatString);
-                               return df.format(value);
-                       } catch (java.lang.IllegalArgumentException e){
-                               return String.valueOf(value);
-                       }
+                       DecimalFormat df = new DecimalFormat(formatString);
+                       return df.format(value);
                }
        }
 }
Comment 3 Nick Burch 2010-06-02 12:13:53 UTC
Fixed in r950616.

We already had a reference to the usermodel style DataFormatter, we just weren't using it, doh! Now changed to call the normal DataFormatter, so the event based extractor should behave more like the usermodel one does
Comment 4 Nick Burch 2010-06-02 13:18:16 UTC
Also, your time formats like HH:MM;HH:MM;HH:MM are also detected as times now too.