Issue 97808

Summary: sort with advanced option case sensitive doesn't work
Product: Calc Reporter: delorea <ambrogio.de.lorenzo>
Component: formattingAssignee: AOO issues mailing list <issues>
Status: REOPENED --- QA Contact:
Severity: Trivial    
Priority: P2 CC: hanya.runo, issues, kschenk, ooo, oooforum, pescetti, schnell4us
Version: OOo 3.0Keywords: oooqa
Target Milestone: 4.2.0   
Hardware: All   
OS: All   
Issue Type: PATCH Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Patch to set upper first if ignore case is not specified
none
Simple writer document with 4 x 4 table
none
Simple calc doc with some values none

Description delorea 2009-01-06 16:34:39 UTC
Steps to reproduce the bug
1. remove the flag in Tools -> Cell Contents -> Autoinput if present
2. Type in A1 the string Io
   Type in A2 the string Tu
   Type in A3 the string io
   type in A4 the string tu
3. select range from A1 to A4
4. Select Data -> Sort
   Leave the option Ascendent and select the Options tab
   Flag the Case sensitivity, and unflag the Range contains column labels
The result expected is (as manual says) Io io Tu tu

The result I obtain is the same if the flag is activated or disactivated.

Also, the manual have to be integrated saying that the case sensitivity is only
for strings that are different only by case
Comment 1 wope 2009-06-13 22:36:37 UTC
Can confirm it on OOo 3.1m11 and DEV300 m50
Comment 2 oooforum (fr) 2015-02-18 15:45:10 UTC
Seems to be solved with latest build AOO 4.1.1
Result is now:
io
Io
tu
Tu
Comment 3 Andrea Pescetti 2015-02-19 19:59:18 UTC
oooforum: this is actually the wrong result (and I confirm it happens on 4.1.1 as you describe).

Correct:  Io io Tu tu
In 4.1.1: io Io tu Tu
Comment 4 oooforum (fr) 2015-02-20 10:09:11 UTC
*** Issue 121428 has been marked as a duplicate of this issue. ***
Comment 5 hanya 2015-10-23 17:34:59 UTC
In ScTable::Sort method, the default collator can be described in with the following code: 

Sub DefaultCollator
  Dim locale as new com.sun.star.lang.Locale
  locale.Language = "en"
  locale.Country = "US"
  op = com.sun.star.i18n.CollatorOptions.CollatorOptions_IGNORE_CASE
  op = 0 ' case sensitive
  c = CreateUnoService("com.sun.star.i18n.Collator")
  n = c.loadDefaultCollator(locale, op)
  if n = 0 then
    s = "a, A: " & CStr(c.compareString("a", "A")) & chr(10)
    s = s & "A, a: " & CStr(c.compareString("A", "a")) & chr(10)
    s = s & "a, b: " & CStr(c.compareString("a", "b")) & chr(10)
    s = s & "b, a: " & CStr(c.compareString("b", "a")) & chr(10)
    msgbox s
    '1 if the first string is greater than the second string
    '0 if the first string is equal to the second string
    '-1 if the first string is less than the second string 
  end if
End Sub

With case sensitive, cmp(a, A): -1, cmp(A, a): 1, cmp(a, b): -1, cmp(b, a): 1
In QuickSort method, if cmp() > 0, cells are swapped when only two cells are there.
cmp(b, a) have to be swapped but cmp(A, a) should not. 
Its hard to sort with these result of the collator in case sensitive.
Comment 6 hanya 2015-10-23 17:44:20 UTC
In the case of Python's cmp function, results are simple.
cmp("a", "A"): 1, cmp("A", "a"): -1, cmp("a", "b"): -1, cmp("b", "a"): 1
cmp() > 0 have to be swapped in these cases.
Comment 7 hanya 2015-10-24 05:52:19 UTC
Created attachment 85065 [details]
Patch to set upper first if ignore case is not specified

It seems default case order in tertiary difference is lower first. 
The patch set its to upper first with tertiary difference.

Collator::setStrength is deprecated since ICU2.6, it can be replaced by 
Collator::setAttribute instead.
Comment 8 oooforum (fr) 2015-10-24 15:59:07 UTC
Thanks hanya

Status changed to PATCH

Maybe targeted to 4.1.2?
Comment 9 Andrea Pescetti 2015-10-24 18:21:30 UTC
OpenOffice 4.1.2 is ready (Release Candidate is being voted upon right now, see dev list) so we won't be able to incorporate this in 4.1.2. But thanks Hanya, and the patch should be reviewed and/or committed to trunk soon.
Comment 10 Kay 2016-01-04 19:05:04 UTC
Add me.
Comment 11 Kay 2016-01-06 23:49:57 UTC
4.2.0 build Rev. 1722749 linux-32.

The patch works correctly for a vertical sort A1:A4, but it doesn't seem to fix the problem for a horizontal sort -- A1:D1 -- assuming I'm doing the selection correctly. I'll be happy to commit this one, however. Would like additional feedback.
Comment 12 hanya 2016-01-07 15:06:08 UTC
(In reply to Kay from comment #11)
> The patch works correctly for a vertical sort A1:A4, but it doesn't seem to
> fix the problem for a horizontal sort -- A1:D1 -- assuming I'm doing the
> selection correctly. 
Works for both column and row sorting on my environment. Could you try again or 
attach some example you have troubled?
Since the attached patch influences all functions which use collator service 
to compare strings like sorting all over the office, these result should be the 
identical anywhere in the office.
Comment 13 hanya 2016-01-07 15:35:46 UTC
I found a problem about sorting with the attached patch in Commen 7. 
In Writer's table, sorting with/without Match case option gave me the same result.
Comment 14 hanya 2016-01-07 15:36:31 UTC
Comment on attachment 85065 [details]
Patch to set upper first if ignore case is not specified

See Comment 13 for the reason.
Comment 15 hanya 2016-01-07 17:51:34 UTC
I tried to analyze about the sorting behavior in Writer's table. 
Data: b B a A
On 4.1.2, without Match case: a A b B or A a B b
On 4.1.2, with Match case: a A b B (stable but wrong result)
Patched, without Match case: a A b B or A a B b
Patched, with Match case: A a B b (stable, correct order)

Sort again without Match case option, you can observe the change of the order.
The observation was wrong which described in Comment 13. 
The sorting in Writer's table is not stable, the result is correct. 
The attached patch is not obsolute but some people might miss judge the result at first observation.
Comment 16 hanya 2016-01-09 14:14:31 UTC
Comment on attachment 85065 [details]
Patch to set upper first if ignore case is not specified

Not obsolete as per Comment 15.
Comment 17 Kay 2016-01-14 00:17:56 UTC
Created attachment 85249 [details]
Simple writer document with 4 x 4 table

Sorts as expected either across first row or down first column.

Leaving Case sensitive unchecked sorts with caps before lower case as expected.
Using case sensitive sorts with lower case first.
Comment 18 Kay 2016-01-14 00:36:58 UTC
Created attachment 85250 [details]
Simple calc doc with some values

for 4.1.2 on Linux-32

Left to right sort of 1st row does nothing no matter what.

Top to bottom of first column produces the following results for me regardless if Case Sensitive is selected or not:
a
A
b
B

If Case Sensitive is NOT selected, the normal collation sequence should produce the following results:

A
a
B
b
Comment 19 Kay 2016-01-14 00:42:25 UTC
Linux-32 on 4.1.2

I've added two little test documents that I'm using.

My findings so far is that the sort of table elements in Writer works correctly, whereas the sort in Calc does not. 

Case insensitive -- the normal collation sort -- should produce capital letters before lower case. Case sensitive should do the opposite.

Writer uses the phrase "Match case" while Calc uses "Case Sensitive" but I think these phrases should mean the same thing to a user.
Comment 20 Alan 2016-01-14 01:41:35 UTC
Comment 17, 18 and 19 are incorrect. in general the comments all seem to indicate that Case Insensitive means sorting in REVERSE collating order. That is NOT what Case Insensitive means. 

Case Insensitive is NEITHER Upper Case before Lower Case NOR Lower Case before Upper Case. It IGNORES case. Hence the name: Case Insensitive.

Case insensitive: ABC = abc = AbC = aBc

A case sensitive sort takes the case of a character into account. The collating sequence of ASCII and UNICODE for the ASCII characters is identical. Upper case comes first. So a case sensitive sort should sort the above as:
   ABC
   AbC
   aBc
   abc

If Lower Case is to be sorted first, that is a Reverse Case Sensitive sort. It is NOT a Case Insensitive sort.

A Case Insensitive sort is important. Sort: nice, Nicer, nicest, Nicely, jump
   Case insensitive:  jump, nice, Nicely, Nicer, nicest
   Case sensitive (Natural Order): Nicely, Nicer, jump, nice, nicest
   Case sensitive (Reverse Order): nicest, nice, jump, Nicer, Nicely

NOTE: Case Insensitive would be preferred when sorting a list of names where some may be capitalized and others not. It is also the sort that would be used by either a Dictionary or Encyclopedia.
Comment 21 Kay 2016-01-14 18:02:55 UTC
(In reply to Alan from comment #20)
> Comment 17, 18 and 19 are incorrect. in general the comments all seem to
> indicate that Case Insensitive means sorting in REVERSE collating order.
> That is NOT what Case Insensitive means. 
> 
> Case Insensitive is NEITHER Upper Case before Lower Case NOR Lower Case
> before Upper Case. It IGNORES case. Hence the name: Case Insensitive.
> 
> Case insensitive: ABC = abc = AbC = aBc
> 
> A case sensitive sort takes the case of a character into account. The
> collating sequence of ASCII and UNICODE for the ASCII characters is
> identical. Upper case comes first. So a case sensitive sort should sort the
> above as:
>    ABC
>    AbC
>    aBc
>    abc
> 
> If Lower Case is to be sorted first, that is a Reverse Case Sensitive sort.
> It is NOT a Case Insensitive sort.
> 
> A Case Insensitive sort is important. Sort: nice, Nicer, nicest, Nicely, jump
>    Case insensitive:  jump, nice, Nicely, Nicer, nicest
>    Case sensitive (Natural Order): Nicely, Nicer, jump, nice, nicest
>    Case sensitive (Reverse Order): nicest, nice, jump, Nicer, Nicely
> 
> NOTE: Case Insensitive would be preferred when sorting a list of names where
> some may be capitalized and others not. It is also the sort that would be
> used by either a Dictionary or Encyclopedia.

You are correct in your assessment. My comments were in relation to what apparently "case insensitive" means within OpenOffice for an alphabetic sort.