Bug 66087 - [PATCH] D* functions match text case-insensitive
Summary: [PATCH] D* functions match text case-insensitive
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2022-05-24 15:17 UTC by Patrick Böker
Modified: 2022-05-28 14:19 UTC (History)
0 users



Attachments
case_insensitive_dstar.patch (1.46 KB, patch)
2022-05-24 15:17 UTC, Patrick Böker
Details | Diff
DGet.xls (54.50 KB, application/octet-stream)
2022-05-25 09:53 UTC, Patrick Böker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Böker 2022-05-24 15:17:34 UTC
Created attachment 38303 [details]
case_insensitive_dstar.patch

This patch makes textual search in D* case-insensitive and adds respective tests.
The behavior now matches Excel 2016.
Comment 1 Patrick Böker 2022-05-24 15:21:27 UTC
Corporate firewall currently blocks me from uploading the test file (\test-data\spreadsheet\DGet.xls). I'll probably manage to do so tomorrow.
Comment 2 PJ Fanning 2022-05-24 16:18:57 UTC
Comment on attachment 38303 [details]
case_insensitive_dstar.patch

Could you add some test coverage? Could you also provide some links to M$ docs that back up your assertion that this should be case insensitive?
Comment 3 PJ Fanning 2022-05-24 16:21:06 UTC
Any interest is working on the TODO items in DStarRunner? 

 * TODO:
 * - wildcards ? and * in string conditions
 * - functions as conditions
Comment 4 Patrick Böker 2022-05-25 09:53:26 UTC
Created attachment 38304 [details]
DGet.xls

As promised yesterday I managed to upload the test file.

To be placed in test-data\spreadsheet\DGet.xls
Comment 5 Patrick Böker 2022-05-25 10:03:38 UTC
> Could you add some test coverage?

As already promised yesterday I just uploaded the missing test file.

> Could you also provide some links to M$ docs that back up your assertion that this should be case insensitive?

This is the official documentation for DGET:

https://support.microsoft.com/en-us/office/dget-function-455568bf-4eef-45f7-90f0-ec250d00892e

(As expected,) the documentation is lacking and does not at all specify the exact semanticts of the function. Also I don't think it worth much to try to implement by documentation. Most important is to be compatible with Excels real behavior itself. And for that there is the test file I just uploaded. :-)

I verified with Excel 2016, but I don't have access to any other excel versions currently. I would be surprised though if different excel versions would behave differently in this regard.
Comment 6 PJ Fanning 2022-05-25 10:06:18 UTC
Comment on attachment 38303 [details]
case_insensitive_dstar.patch

You have bot provided a java unit test that uses the DGet xlsx - so there will be zero test coverage of this change
Comment 7 Patrick Böker 2022-05-25 10:11:51 UTC
(In reply to PJ Fanning from comment #3)
> Any interest is working on the TODO items in DStarRunner? 
> 
>  * TODO:
>  * - wildcards ? and * in string conditions
>  * - functions as conditions

I'd like to. But currently my POI work is sponsored by my company, thus I'm mostly implementing what we are using there. If the demand for more elaborate D* crops up, I'll definitely be on it.

Just in case you didn't realize, I'm the original implementer of the D* funcions.
Comment 8 Patrick Böker 2022-05-25 10:14:41 UTC
(In reply to PJ Fanning from comment #6)
> Comment on attachment 38303 [details]
> case_insensitive_dstar.patch
> 
> You have bot provided a java unit test that uses the DGet xlsx - so there
> will be zero test coverage of this change

The test-data\spreadsheet\DGet.xls already exists in the repo and the corresponding src\test\java\org\apache\poi\ss\formula\functions\TestDGetFunctionsFromSpreadsheet.java is there already as well. I have just updated the test file.
Comment 9 PJ Fanning 2022-05-25 10:25:40 UTC
In your patch, you use `.toUpperCase().toLowerCase()` - why both?

Also, this will fail our code quality checks - we do not allow toUpperCase or toLowerCase without providing a Locale. Have you run a build of this change? The build target is called `forbiddenapis`.
Comment 10 PJ Fanning 2022-05-25 10:31:49 UTC
I added r1901229 as a preliminary change - I only used toLowerCase in my patch.
Comment 11 Patrick Böker 2022-05-25 10:44:21 UTC
(In reply to PJ Fanning from comment #9)
> In your patch, you use `.toUpperCase().toLowerCase()` - why both?

I looked up what `.equalsIgnoreCase()` is doing (https://javadoc.scijava.org/Java11/java.base/java/lang/String.html#equalsIgnoreCase(java.lang.String)) and tried to mimic that. I don't know the deeper reason behind it though.

> Also, this will fail our code quality checks - we do not allow toUpperCase
> or toLowerCase without providing a Locale. Have you run a build of this
> change? The build target is called `forbiddenapis`.

Actually I didn't do a full build after the change. I only ran the tests. I tried it now. As you said, the build does actually error. Thanks for pointing that out. I'll try to do a full build *before* next time.
Comment 12 Patrick Böker 2022-05-25 10:44:53 UTC
(In reply to PJ Fanning from comment #10)
> I added r1901229 as a preliminary change - I only used toLowerCase in my
> patch.

Thank you!
Comment 13 PJ Fanning 2022-05-25 12:40:43 UTC
I added a DGET test using r1901237 - ignore the DCOUNT stuff in this revision (it's an experiment and the code might be removed).

The DGET test is based on the M$ docs and it fails - so it appears that the existing DGET implementation is not correct.
Comment 14 PJ Fanning 2022-05-25 15:15:34 UTC
I found the issue in the new DGET test so the test is ok now
Comment 15 PJ Fanning 2022-05-25 15:55:18 UTC
I found that the current code does not support providing the field name (2nd param) as a number - Excel supports this.

I added a fix using r1901250
Comment 16 PJ Fanning 2022-05-25 15:57:45 UTC
I've added DCOUNT support - tracked as https://bz.apache.org/bugzilla/show_bug.cgi?id=66090