Issue 87688 - Search for kashida positions in Arabic justified text buggy
Summary: Search for kashida positions in Arabic justified text buggy
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 2.4.0
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@sw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-01 11:24 UTC by hennerdrewes
Modified: 2013-08-07 14:43 UTC (History)
2 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
patch proposal for sw/source/core/text/porlay.cxx (11.33 KB, text/plain)
2008-04-01 11:27 UTC, hennerdrewes
no flags Details
Sample Text OOo 2.4 (37.08 KB, application/pdf)
2008-04-01 20:37 UTC, hennerdrewes
no flags Details
Sample Text OOo 2.4 with patches (36.40 KB, application/pdf)
2008-04-01 20:39 UTC, hennerdrewes
no flags Details
Selected words before patch (10.94 KB, application/pdf)
2008-04-01 20:40 UTC, hennerdrewes
no flags Details
Selected words after patch (10.92 KB, application/pdf)
2008-04-01 20:41 UTC, hennerdrewes
no flags Details
Kashida document full text (20.22 KB, application/vnd.oasis.opendocument.text)
2008-04-02 08:15 UTC, hennerdrewes
no flags Details
Kashida selected words document (8.01 KB, application/vnd.oasis.opendocument.text)
2008-04-02 08:16 UTC, hennerdrewes
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description hennerdrewes 2008-04-01 11:24:06 UTC
I had a closer look at the search for kashida positions in porlay.cxx. And what
I saw wasn't exactly convincing. When working on issue 77976 I already noticed
that writer places kashidas somewhat differently than usp's ScriptJustify would. 

But then I found out that writer's kashida search intends to mimic the priority
search done by MS, but not very successfully.

OK, here is my version...
Comment 1 hennerdrewes 2008-04-01 11:27:15 UTC
Created attachment 52429 [details]
patch proposal for sw/source/core/text/porlay.cxx
Comment 2 hennerdrewes 2008-04-01 11:43:22 UTC
After applying the patch, more kashida positions are found. The overall
appearance of a justified line is much smoother now. 

But there a still other problems to be solved for kashida justified text. The
gaps inside words and kashidas appended to the end of a word, as reported in
issue 34139 and issue 60594. 

The kashidas appended to the end of a word have been mainly eliminated by the
fix in issue 77976. 

The gaps remain, but I will post a patch to winlayout.cxx at issue 60594. I am
fully aware (discussion with hdu), that overriding writer's character positions
in winlayout.cxx is problematic, but still I could produce a far better result
by this. 

to be continued at issue 60594...
Comment 3 michael.ruess 2008-04-01 12:54:58 UTC
Reassigned to SBA.
Comment 4 frank.meies 2008-04-01 14:30:59 UTC
fme->hennerdrewes: Thank you for your patch. I'll have a look.
Comment 5 frank.meies 2008-04-01 15:03:49 UTC
fme->hennerdrewes: Could you please attach a document showing one or more cases,
that weren't correctly handled by the old algorithm? Thanks.
Comment 6 hennerdrewes 2008-04-01 20:37:45 UTC
Created attachment 52440 [details]
Sample Text OOo 2.4
Comment 7 hennerdrewes 2008-04-01 20:39:24 UTC
Created attachment 52441 [details]
Sample Text OOo 2.4 with patches
Comment 8 hennerdrewes 2008-04-01 20:40:25 UTC
Created attachment 52442 [details]
Selected words before patch
Comment 9 hennerdrewes 2008-04-01 20:41:16 UTC
Created attachment 52443 [details]
Selected words after patch
Comment 10 hennerdrewes 2008-04-01 21:23:36 UTC
I posted some sample documents. The original document was posted in issue 28203:
kashida.sxw. 

For demonstration purposes, I copied some words to short lines, to show the
differences most clearly (kashida_before_patch3.pdf and kashida_after_patch3.pdf).

The differences that you can see are actually the result of three patches: this
one, the one I posted in issue 60594, which is based on previous fixes from
issue 77976. 

The second line in kashida_before_patch3.pdf demonstrates most clearly the
effects of kashida placing. Before the patch, the first word did not receive any
kashida, the second word got a kashida before its last letter. After the patch
both words get a kashida, and notably in the second word it gets inserted
already before the third letter. 

In the first line in the first word, a kashidas get inserted in a places where
they cannot connect. At least in the second word, the cause was in the layout
engine and this was fixed in issue 77976. What you see in the first word is
probably the result of both bugs. 

In the implementation of the kashida search were several mistakes. First I just
noticed the differences to the results of the usp justification, but I thought
that it might be intentionally. Then I found the MS documentation at
http://www.microsoft.com/middleeast/msdn/JustifyingText-CSS.aspx, and the
priorities in the kashida search seemed to match what the OOo code was trying to
do. 

But the OOo code didn't compare the results of the different priority classes.
It simply decided for the first match it found. If it found a match of the 3rd
priority at the 2nd letter, it would not get to a match of 1st priority at the
3rd or 4th letter. 

Also it says: "If more than one connection opportunity have the same highest
value, use the opportunity closest to the end of the word." As I said before, if
it found a match, the game was over, not getting to the end of the word.

And most seriously, there seemed to be a misunderstanding on the letters
referenced in the priority classes. First of all, as far as I understand, the
letter names refer to shaping groups as they are described in
http://unicode.org/book/ch08.pdf in table 8.6. So you should actually check for
a group of unicode chars. Where it states, that a kashida may be inserted before
a final form of a letter, you have to keep in mind that the only-right-joining
characters will have their "final form" also in the middle of a word. 

Of course, I could not be 100% sure of this interpretation, though it just
seemed more logical. But after implementing the changes I just saw that
everything matches exactly the way usp would place the kashidas, so I was
assured that I got it right. 

In the longer document you can occasionally see the gaps inside the words in the
before-patch version. But that is the other issue...
Comment 11 frank.meies 2008-04-02 08:02:05 UTC
fme->hennerdrewes: If I understood correctly, this patch is also related to
issue 60594. To judge from your comments there's still some work left to do for
issue 60594. Anyway, from my point of view this porlay.cxx patch makes sense,
even without any further patches for other issues. If this is correct, I'll add
this patch to cws fmepatches02. Please also attach the odt documents for
kashida_before_patch3.pdf and kashida_after_patch3.pdt. Thank you.
Comment 12 hennerdrewes 2008-04-02 08:15:51 UTC
Created attachment 52448 [details]
Kashida document full text
Comment 13 hennerdrewes 2008-04-02 08:16:48 UTC
Created attachment 52449 [details]
Kashida selected words document
Comment 14 hennerdrewes 2008-04-02 08:37:34 UTC
I posted the requested documents.

The patch can be applied separately from the other patches, of course. From the
user's perspective, however, they are linked. Presently all Arabic (as Hebrew
before 77679 fix) justification text output is so buggy that it cannot be used. 

I think it would be a really a good idea to target issue 60594 for OOo 3.0, so
that this release will be fit to handle this. Otherwise as a user, I would
expect a clear statement: OOo is not able to handle this at present. (not so nice)

As I stated before, I am willing to help with that (need assistance though!)

Further remarks: 

- Does this patch also need to be transfered to the edit engine?
- The lcl_ConnectToPrev function could be a lot nicer with the isXXXChar
functions. Simply check for the right joining classes. If true, return false,
otherwise proceed. 

Also found some other issues with Arabic justification, but this will be in a
separate issue (need some more time for that)



Comment 15 frank.meies 2008-04-02 10:14:45 UTC
fme->henner: Again, thank you for your contribution. I'll add this patch to cws
fmepatches02 and asked Herbert (hdu) to have a look at issue 60594.
Comment 16 frank.meies 2008-04-02 10:20:55 UTC
Added patch to cws fmepatches02.
Comment 17 frank.meies 2008-05-23 18:09:29 UTC
fme->sba: Ready for QA.
Comment 18 stefan.baltzer 2008-06-04 10:13:54 UTC
SBA: Verified in CWS fmepatches02
Comment 19 thorsten.ziehm 2009-07-20 14:54:07 UTC
This issue is closed automatically and wasn't rechecked in a current version of
OOo. The fixed issue should be integrated in OOo since more than half a year. If
you think this issue isn't fixed in a current version (OOo 3.1), please reopen
it and change the field 'Target Milestone' accordingly.

If you want to download a current version of OOo =>
http://download.openoffice.org/index.html
If you want to know more about the handling of fixed/verified issues =>
http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues