Bug 60031

Summary: [PATCH] DataFormatter parses months incorrectly when put at the end of date segment
Product: POI Reporter: Andrzej Witecki <sodoww+bz.apache.org>
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: sodoww+bz.apache.org
Priority: P2 Keywords: PatchAvailable
Version: 3.15-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: solution proposal
Final patch

Description Andrzej Witecki 2016-08-23 08:35:23 UTC
When you have a date format which has two segments (date and time) and the month indicator ("MM") is placed at the end of date segment than it's replaced to minutes indicator ("mm") within DataFormatter#createDateFormat(String, double) for loop.
Eg.: "yyyy\\-dd\\-mm\\ hh:mm:ss" -> "yyyy-dd-mm HH:mm:ss"
and even: "yyyy-dd-MM HH:mm:ss" -> "yyyy-dd-mm HH:mm:ss"

It's obviously caused by the "if 'M' precedes 's' it should be minutes ('m')" part of code (l.565 for POI 3.15b2) which uses obsolete "ms" variable. Wouldn't clearing "ms" variable within last else (for white spaces; l.589 in 3.15b2) be sufficient to solve this issue?
Comment 1 Andrzej Witecki 2016-08-23 08:42:32 UTC
Of course the obvious result of this bug is parsing date from double incorrectly, eg. date 23-08-2016 08:51:01 which is 42605.368761574071 as double will be parsed with format "yyyy-dd-MM HH:mm:ss" into "2016-23-51 08:51:01".

Problem occurred on POI 3.11 but it can also be reproduced on 3.15b2.
Comment 2 Dominik Stadler 2016-10-14 21:40:22 UTC
Unfortunately your proposed simple fix of clearing ms causes other tests to fail, e.g. this one:

assertEquals("57:07.2", dfUS.formatRawCellContents(.123, -1, "mm:ss.0;@"));

A unit-test to verify your bug is:

    @Test
    public void testBug60031() {
        // 23-08-2016 08:51:01 which is 42605.368761574071 as double will be parsed
        // with format "yyyy-dd-MM HH:mm:ss" into "2016-23-51 08:51:01".
        DataFormatter dfUS = new DataFormatter(Locale.US);
        assertEquals("2016-23-08 08:51:01", dfUS.formatRawCellContents(42605.368761574071, -1, "yyyy-dd-MM HH:mm:ss"));
    }
Comment 3 Andrzej Witecki 2016-11-09 14:51:39 UTC
Created attachment 34431 [details]
solution proposal

Made a fix which also tackles the problem mentioned before. I'm attaching the patch with this fix and test case made by Dominik.
Comment 4 Andrzej Witecki 2016-11-10 08:39:56 UTC
Unfortunately my patch introduces a failure in TestHSSFSheet#autoSizeDate. Fortunately it's caused by parsing cell values correctly (IMHO).
Cell values in this test are formatted with pattern "yyyy-mm-dd MMMM hh:mm:ss". There are 2 cells: first with numerical value 1 (1.01.1900) and second with numerical value 123456 (3.01.2238).
Before my patch:
- result pattern is: "yyyy-MM-dd mmmm HH:mm:ss"
- String value of first cell is: 1900-01-01 0000 00:00:00
- String value of second cell is: 2238-01-03 0000 00:00:00
After my patch:
- result pattern is: "yyyy-mm-dd MMMM hh:mm:ss"
- String value of first cell is: 1900-01-01 January 00:00:00
- String value of second cell is: 2238-01-03 January 00:00:00

I am not providing a patch for this issue yet as I would prefer sbd verify my assumptions.

BTW. There're typos within above-mentioned test (TestHSSFSheet#autoSizeDate l.729). Guess there should be 1 instead of 0 for #getColumnWidth invocations arguments.
Comment 5 David North 2017-01-03 17:32:07 UTC
(In reply to Andrzej Witecki from comment #4)

> I am not providing a patch for this issue yet as I would prefer sbd verify
> my assumptions.

Verified - as per the comments in the code, Excel formats both mmmm and MMMM as the full month name. Please go ahead and extend your patch accordingly.
Comment 6 Andrzej Witecki 2017-01-17 16:12:09 UTC
Created attachment 34632 [details]
Final patch

Including new patch which also tackles problems (introduced by my changes) in TestHSSFSheet#autoSizeDate and TestDataFormatter
Comment 7 Javen O'Neal 2017-01-20 05:37:38 UTC
Applied to trunk in r1779564. Thanks for the patch!

This will be included in POI 3.16 beta 2.

Fixed TestHSSFSheet typo in r1779565.