Issue 76156

Summary: Counting number of substrings within a string
Product: Calc Reporter: discoleo <discoleo>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues
Version: OOo 2.2   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Proposed C++ implementation none

Description discoleo 2007-04-06 17:56:18 UTC
Unfortunately, Calc lacks even basic support for string functions (see my other
issues on strings, like issue 66590). This makes the program unsuitable for any
work that relies heavily on strings.

One of the functionalities that I am missing is a function that counts the
number of occurrences of a substring within a given string.

Lets say, we have the string: "ab cd ab de", and we want to count how often "ab"
occurs inside this string.

I have included in the attached file the C++ code for a new function,
ScInterpreter::ScFindCount(), that performs this task.

This function counts the number of occurences of a given string within another
string
//   - IF string is NOT found, it returns 0 
//   - otherwise it returns the number of occurences of the substring
//   - parameters:
//     1. string: this is the search item
//     2. string: this will be searched for the first string
//     3. is overalpping of search results allowed:
//        - default: FALSE (NO)
//     4. START position: default = begining of string (=1)
//     5. END position: default = LEN(string)

I noticed now, that the function will probably NOT recognise wildcards, so I
will look forward to correct that. I hope that such a function will make it in
the OpenFormula specification, too.
Comment 1 discoleo 2007-04-06 17:58:00 UTC
Created attachment 44257 [details]
Proposed C++ implementation
Comment 2 discoleo 2007-04-06 19:12:26 UTC
1. I just wanted to add a real life example.

My primary work involves auditing the whole medical activity taking place in a
big county. One frequent task is to count the number of different diagnoses for
individual patients.

The string looks like this: "B18.1; G20; I20.9; I12.9". These tokens represent
individual ICD10-coded diagnoses. So, I would like to count how many diagnoses
are there (i.e. FindCount(";",'the_string') + 1).

2. I found the code for *REGEXP* searches inside the ScInterpreter::ScSearch()
function. Unfortunately, I do NOT understand much of it and I fear that there is
NO good documentation on it.

I will try to implement a ScSearchCount() function later, just by cloning the
ScSearch() function.
Comment 3 frank 2007-04-10 09:48:04 UTC
Hi Niklas,

please have a look at this patch.

Frank
Comment 4 niklas.nebel 2007-05-23 14:03:13 UTC
We generally don't want to add many new internal functions. Additional functions
can preferably be added as add-in components.

Also, issue type "patch" should be limited to complete patches, ready to apply.
Comment 5 mrprogrammer 2010-08-19 18:57:19 UTC
If A1 contains "ab cd ab de" and B1 contains "ab", we can use
=(LEN(A1)-LEN(SUBSTITUTE(A1;A2;"")))/LEN(A2)
which I realize is messy but it gets the job done.
Comment 6 mrprogrammer 2010-08-19 18:59:43 UTC
Oops, I meant: If A1 contains "ab cd ab de" and A2 contains "ab" …
Comment 7 Marcus 2017-05-20 11:13:50 UTC
Reset assigne to the default "issues@openoffice.apache.org".