Issue 32344 - Simple wildcards * and ? as opposed to regular expressions
Simple wildcards * and ? as opposed to regular expressions
Status: CONFIRMED
Product: Calc
Classification: Application
Component: ui
recent-trunk
All All
: P3 Trivial with 5 votes (vote)
: ---
Assigned To: AOO issues mailing list
: ms_interoperability
: 46706 48020 51072 87106 96557 101712 102843 113243 121015 121018 124567 (view as issue list)
Depends on:
Blocks: 15522
  Show dependency treegraph
 
Reported: 2004-07-30 12:03 UTC by ooo
Modified: 2015-03-28 01:42 UTC (History)
11 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description ooo 2004-07-30 12:03:35 UTC
Follow-up task of Q-PCD issue 20494.
Some competitors know only simple wildcards * and ? as opposed to regular
expressions. Need option how to treat wildcard searches, especially when
importing files.
Comment 1 ooo 2004-07-30 12:04:10 UTC
Accepted.
Comment 2 ooo 2004-07-30 12:29:30 UTC
reassign to myself
Comment 3 ooo 2004-07-30 12:30:05 UTC
Accepted.
Comment 4 watersrw 2004-10-17 14:54:34 UTC
Should this not apply to the other major components (Writer, etc.) as well?

Unless you are an expert user, wildcards are a life-saver.
Comment 5 ooo 2004-10-20 14:15:31 UTC
Watersrw,

This is not about the "Search & Replace" functionality, but the wildcards that
some spreadsheet  functions may take as arguments, producing different results
when executed in different applications.

Eike
Comment 6 colm.smyth 2005-02-10 19:20:00 UTC
This issue has been proposed as a SO9 requirement. Please clarify what kinds of
functions and parameters should be able to use wildcards.
Comment 7 ooo 2005-02-11 11:22:15 UTC
The following spreadsheet functions are currently able to process regular
expressions in the arguments that specify search criteria for data ranges:

DSUM
DCOUNT
DCOUNTA
DAVERAGE
DGET
DMAX
DMIN
DPRODUCT
DSTDEV
DSTDEVP
DVAR
DVARP
MATCH
COUNTIF
SUMIF
LOOKUP
VLOOKUP
HLOOKUP
SEARCH

Similar to the "Enable regular expressions in formulas" option under
Tools.Options.Spreadsheet.Calculate, another option would be needed like "Enable
simple wildcards in formulas". These two are mutually exclusive => radio button
"None / RegExp / Wildcard", where Wildcard would be enabled by the Excel filter
import for interoperable formula results.
Comment 8 frank 2005-04-05 14:00:41 UTC
*** Issue 46706 has been marked as a duplicate of this issue. ***
Comment 9 frank 2005-04-23 18:21:42 UTC
*** Issue 48020 has been marked as a duplicate of this issue. ***
Comment 10 navvy 2005-04-25 10:38:21 UTC
The problem is Excel formulas that do not give the correct result in Open 
Office.  

When I open an Excel spreadsheet in Open Office, I need to either get the 
correct results or to get a message warning that some cells contain formulas 
that cannot be handled correctly.  Open Office does neither.

As an Excel user, I don't want to know about regular expressions - I just want 
Open Office to give the same answers as Excel.

I also don't want to accidentally create Open Office formulas that will give 
the wrong results in Excel.

If there is a radio button in Tools/Options to change the formula-handling 
method, I would expect that to be automatically set correctly to match the way 
that Excel works, whenever an Excel file is opened.  Alternatively, when the 
troublesome formulas are detected, a warning box should open when the file is 
opened, preferably giving a quick way to make the necessary changes.

Comment 11 frank 2005-06-30 12:14:16 UTC
*** Issue 51072 has been marked as a duplicate of this issue. ***
Comment 12 drensink 2005-07-31 01:13:09 UTC
Instead of disabling regular expressions in the options when opening an excel
document that uses wildcard characters or adding a wildcard option to the
options,  wouldn't it be easier to actually convert the ? and * wildcard
characters to regular expressions on import and then convert the regular
expressions back to wildcards when exporting back to excel formats, giving a
warning for any regular expressions that cannot be converted exactly to wildcards.

Comment 13 navvy 2005-08-02 16:35:00 UTC
I agree with drensink.  Automatic translation is what the user needs and 
expects.  

Changing options that affect all workbooks, will cause more problems than it 
solves.

If the user sees a warning that a regular expression could not be translated 
to Excel format, they will learn that Open Office is more powerful than 
Excel.  Currently the inadequate translation gives the opposite impression, if 
the user has never heard of 'regular expressions'.
Comment 14 ooo 2005-08-03 16:30:45 UTC
Drensink, Navvy,

Please note that "automatic translation" is not feasible. It might work in
simple cases of literal string arguments as parameters during import, but as
soon as the search expression is concatenated from substrings or obtained via
formula functions it would fail. Furthermore it wouldn't work with round-trip
documents, i.e. load from Excel and save to Excel file format again. The only
viable way is another wildcard mode, which would also ease use for the not so
experienced user for whom regular expressions are too complicated.

Eike
Comment 15 drensink 2005-08-04 04:08:49 UTC
Eike,

   I did not quite understand your comment at first, because the strings I was
using for the DSUM criteria were a string (substring) in a cell (reference to
cell being a formula). I now see that you could not just go replacing them all
because there is no way to know if they are just an * or ? or actual wildcards
short of scanning the whole sheet to see if each is an input into a formula that
handles wildcards which would not be so good logic and performance wise.

  I then wondered what would happen with a spreadsheet using wildcards was
imported, but the user wanted to use regular expressions on that same sheet and
still have the wildcards calculated.  The obvious hangup with that is the ? and
* do different things depending which option you want.  A good answer to that
would be for the functions like DSUM to have two implementations, one that
calculates with regular expressions and one that calculates with wildcards. 
This would create yet another problem since DSUM used in calc is not the same as
DSUM in excel, but the same formula name has been used by both excel and calc
users, though this might be easy to take care of on import and export by
translating DSUM in excel to something like DSUMWILDCARD in calc and something
similar on export.

This all assumes someone would actually want to use both wildcards and regular
expressions on the same sheet, but that might be a bit crazy and I understand
implementing the option you propose is probably far quicker, especially if you
could have it coded in 2.0 or 2.0.1 instead of later versions.






 
Comment 16 drensink 2005-08-06 05:10:25 UTC
I gave my last comment more thought and realized you can maintain the same
behavior openoffice uses now with regular expressions, make the excel imports
and exports compatible, and mix the use of both regular expressions and
wildcards on the same sheet and in any cells by doing the following.

1.  Provide an optional argument to any formula function that will take regular
expressions in openoffice, but takes wildcards in excel.  

2.  If the argument is set to true, assume the function arguments use ? and * as
wildcards.

3.  If the argument is false or not set, assume the function arguments may
contain regular expressions.

4.  For documents imported from excel, add the argument set to true to the cell
containing the formula.

5.  For documents in openoffice assume the argument is not set by default.

6.  For documents in openoffice to be exported to excel the user can set the
argument to use wildcards.

7.  If the user doesnt set the argument, warn that excel does not fully support
the use of regular expressions.

This would allow the use of the same formula names for both cases and should be
fairly easy to code into the existing regular expression code, since it then
becomes a simple replacement of ? and * in the arguments fed to the functions
with the equivalent regular expression.

  
Comment 17 ooo 2005-08-15 11:13:55 UTC
Hi Drensink,

> 6.  For documents in openoffice to be exported to excel the user can set the
> argument to use wildcards.

What should be exported if that argument is the result of a formula?

> This would allow the use of the same formula names for both cases and should be
> fairly easy to code into the existing regular expression code, since it then
> becomes a simple replacement of ? and * in the arguments fed to the functions
> with the equivalent regular expression.

I'd rather implement a simple wildcard search than fiddling around with
the argument's string content. Bear in mind that all other content that
possibly could form a regular expression would have to be escaped
otherwise.

  Eike
Comment 18 drensink 2005-08-16 03:32:08 UTC
This gets a bit more complicated than it seems at face value, but I will
continue a bit more to see if there is an optimal solution to the problem.

> What should be exported if that argument is the result of a formula?

Keep in mind this would not be the argument with the wildcards/regular
expressions in it, but an additional, optional argument whose value would be
restricted to 0 or 1, True or False.  If it is set to 1, then the formula uses
wildcards and you export the other argument to the function with the wildcards
in it as it is written by the author of the document and strip the 1 during
export since Excel does not use it because by default it uses wildcards.  If it
is not set or set to 0, then you know you are exporting a document that is very
likely not to be compatible with Excel, but could warn and continue exporting
anyway.

  If the additional, optional argument is a result of a formula,  which should
not be necessary, then that formula should result in True or False, 0 or 1
otherwise it would be in error. That still would not matter because you would
just check to see if it results in 0 and 1 and strip the whole formula off just
like you would strip the 1 off. Though it would be easier to just restrict that
argument to only the values 0,1,True,False and anything else would result in a
formula error.

>   I'd rather implement a simple wildcard search than fiddling around with
>the argument's string content. Bear in mind that all other content that
>possibly could form a regular expression would have to be escaped
>otherwise.

    The additional, optional argument would just determine which code path is
taken.  If you want to do a whole new implementation of the formula using
wildcards instead of messing with the string content of the argument containing
wildcards/regular expressions, the same principle would still work.

   The biggest problem I see with the supposedly simple solution, just turn off
regular expressions and turn on wildcards is that it immediately eliminates the
use of more powerful regular expressions in that document, preventing someone
from importing a document containing wildcards from excel for the purpose of
using regular expressions. The continued incompatibility on export of formulas
with the same names using regular expressions would also still have to be dealt
with in the simple case.

   The basis of my approach is almost exactly the same as doing it for the whole
document or sheet, with the exception that it is done for each cell where those
formulas are used, rather than the whole document at once. You still have to
decide whether its a regular expression or wildcard and fix the code for each
formula to handle it if you do it for the whole document.



Comment 19 drensink 2005-08-16 03:53:26 UTC
One more note: It would be an irritatant to a user to have to add the ;1
constantly to a formula. The option for the document could be added anyway so
when it is set to wildcards the ;1 is automatically added upon finishing the
formula input.



  
Comment 20 ooo 2005-08-30 12:02:53 UTC
Hi Drensink,

> > What should be exported if that argument is the result of a formula?

> Keep in mind this would not be the argument with the wildcards/regular
> expressions in it, but an additional, optional argument whose value would be
> restricted to 0 or 1, True or False.

You didn't get my point. The argument doesn't have to be a direct value,
it could be a formula result instead, e.g. FALSE(), or a cell reference
where the user could play with the value, or any sophisticated formula,
even with side effects. In these cases stripping away the parameter
simply can't be done.

  Eike
Comment 21 drensink 2005-09-05 03:31:57 UTC
> You didn't get my point. The argument doesn't have to be a direct value

I think I did get your point and thought I had explained that in the statement
below to be interpreted as (The argument DOES have to be a direct value)

"Though it would be easier to just restrict that argument to only the values
0,1,True,False and anything else would result in a formula error and if it was
not that would be in error."

  I can understand that you might want to or have to generalize from this
specific case because the next complaint would be that a formula result could
not be used.  I can also understand that there may be no existing code in the
current formula handling to force an argument to be only a direct value.

Since it seems all of the above is not a realistic option, would you be willing
to agree a reasonable option would be to make the setting applicable to each
sheet in a workbook, rather than to the entire workbook? 

This would allow an import from Excel using wildcards and then adding a new
sheet to the workbook that uses regular expressions within the same workbook.
The user could reference cells on those sheets within the same workbook that are
results of regular expressions.

Thankyou for being patient in explaining your objections and the reasons for
those objections in detail.



 
Comment 22 ooo 2007-08-13 17:06:09 UTC
Just an overdue update: the OASIS ODF TC approved the proposal to include a
table:use-wildcards attribute in the file format, see
"Proposal for a new calculation setting: table:use-wildcards" under
http://wiki.oasis-open.org/office/OpenDocument_v1%2e2_Action_Items#head-4854119a1f3d7df4f22fc453bb50e8c6e5b0acd7
Comment 23 villeroy 2008-01-11 21:59:26 UTC
Text comparison in Calc occurs in
1. Formulas by operator =A1=B1 
2. Functions like MATCH etc.
3. Filters
4. Find/Replace

It is influenced by following options:
- Calc>Calculation>Case sensitive (affects 1. only)
- Calc>Calculation>Search criteria = and <> must apply to whole cells (affects
2. and 3.)
- Enable regular expressions in formulas (affects 2. only, but not 1.) Better
labeled "Enable regular expressions in function arguments"?

Filters, SortDescriptors and Find/Replace have their own options for regex and
case. There are too many options and implications. The behaviour is
inconsistent. For instance: Did anybody notice that case-insensitive cell values
obsoletes many benefits of regular expressions?

My radical proposal:
One option to toggle between native mode and compatibility mode.
Native mode:
All strings in Calc are case sensitive and support regular expressions, "=" and
"<>" do not apply to whole cells unless you use them with "^" and "$".
Compatibility mode:
Excel files are loaded in compatibility mode:
All strings are case-insensitive except for EXACT and FIND. They support
wildcards. "=" and "<>" apply to whole cells.

Comment 24 ooo 2008-03-17 14:44:44 UTC
*** Issue 87106 has been marked as a duplicate of this issue. ***
Comment 25 gvsa123 2008-10-26 04:03:42 UTC
I was working with an xls file and discovered this problem. From a users 
perspective, wouldn't it be simpler to simply have a message stating that OOo 
is trying to open a file that contains regular expressions/wild cards and asks 
whether the user wants to enable the regular expression option? 

i'm not sure about more complex formulas but it seems that there is no problem 
with OOo understanding and translating into excel's way of handling things, so 
long as the option is turned on. i would think therefore that simply letting 
the user know that the option has to be turned on because there are regular 
expressions in the formula would be good enough, so that the user doesn't get 
shocked to see all zeros in the cells as I did. I mean if there isn't any 
technical problem as regards OOo translating and understanding the way other 
spreadsheet applications handle formulas (OOo can open and save either way), 
then there's really no need to replace and auto translate formulas. if all the 
user needs to do to work with the spreadsheet (whichever format) and to save it 
(in whichever format as well) is to turn the option on or off, then so be it. a 
simple prompt like:

"The spreadsheet file you are trying to open may contain regular expression. Do 
you want to turn these feature on?" 

or something...
Comment 26 Rainer Bielefeld 2008-11-28 16:47:05 UTC
*** Issue 96557 has been marked as a duplicate of this issue. ***
Comment 27 ooo 2009-07-28 12:43:51 UTC
*** Issue 102843 has been marked as a duplicate of this issue. ***
Comment 28 ooo 2010-07-19 13:46:53 UTC
*** Issue 101712 has been marked as a duplicate of this issue. ***
Comment 29 ooo 2010-07-19 13:49:57 UTC
*** Issue 113243 has been marked as a duplicate of this issue. ***
Comment 30 Regina Henschel 2012-09-18 12:51:55 UTC
*** Issue 121015 has been marked as a duplicate of this issue. ***
Comment 31 Regina Henschel 2012-09-18 13:05:43 UTC
*** Issue 121018 has been marked as a duplicate of this issue. ***
Comment 32 Sebastian Sauer 2013-12-01 11:08:27 UTC
Related section at ODF 1.2:
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017868_715980110

Introduced on 06/May/10:
https://tools.oasis-open.org/issues/browse/OFFICE-2679?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel

Related AOO.org discussion:
http://mail-archives.apache.org/mod_mbox/incubator-ooo-dev/201207.mbox/%3C73f2681577a4ff873095d71260bf3d27@xaox.net%3E

Related LO bugreport:
http://en.libreofficeforum.org/node/6950

What's needed? AOO and LO need to implement support for HOST-USE-WILDCARDS and on XLS/XLSX import need to set it to true.
Comment 33 Sebastian Sauer 2013-12-01 15:22:38 UTC
Related bugreport in LibreOffice bugzilla:
https://www.libreoffice.org/bugzilla/show_bug.cgi?id=72196
Comment 34 Rainer Bielefeld 2014-03-20 17:47:09 UTC
erack no longer active here, so back to list.
Comment 35 mroe 2014-03-31 11:19:43 UTC
*** Issue 124567 has been marked as a duplicate of this issue. ***