Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Search for kashida positions in Arabic justified text buggy | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Writer | Reporter: | hennerdrewes <hennerd> | ||||||||||||||||
Component: | code | Assignee: | stefan.baltzer | ||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | issues@sw <issues> | ||||||||||||||||
Severity: | Trivial | ||||||||||||||||||
Priority: | P3 | CC: | hdu, issues | ||||||||||||||||
Version: | OOo 2.4.0 | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Issue Type: | PATCH | Latest Confirmation in: | --- | ||||||||||||||||
Developer Difficulty: | --- | ||||||||||||||||||
Attachments: |
|
Description
hennerdrewes
2008-04-01 11:24:06 UTC
Created attachment 52429 [details]
patch proposal for sw/source/core/text/porlay.cxx
After applying the patch, more kashida positions are found. The overall appearance of a justified line is much smoother now. But there a still other problems to be solved for kashida justified text. The gaps inside words and kashidas appended to the end of a word, as reported in issue 34139 and issue 60594. The kashidas appended to the end of a word have been mainly eliminated by the fix in issue 77976. The gaps remain, but I will post a patch to winlayout.cxx at issue 60594. I am fully aware (discussion with hdu), that overriding writer's character positions in winlayout.cxx is problematic, but still I could produce a far better result by this. to be continued at issue 60594... Reassigned to SBA. fme->hennerdrewes: Thank you for your patch. I'll have a look. fme->hennerdrewes: Could you please attach a document showing one or more cases, that weren't correctly handled by the old algorithm? Thanks. Created attachment 52440 [details]
Sample Text OOo 2.4
Created attachment 52441 [details]
Sample Text OOo 2.4 with patches
Created attachment 52442 [details]
Selected words before patch
Created attachment 52443 [details]
Selected words after patch
I posted some sample documents. The original document was posted in issue 28203: kashida.sxw. For demonstration purposes, I copied some words to short lines, to show the differences most clearly (kashida_before_patch3.pdf and kashida_after_patch3.pdf). The differences that you can see are actually the result of three patches: this one, the one I posted in issue 60594, which is based on previous fixes from issue 77976. The second line in kashida_before_patch3.pdf demonstrates most clearly the effects of kashida placing. Before the patch, the first word did not receive any kashida, the second word got a kashida before its last letter. After the patch both words get a kashida, and notably in the second word it gets inserted already before the third letter. In the first line in the first word, a kashidas get inserted in a places where they cannot connect. At least in the second word, the cause was in the layout engine and this was fixed in issue 77976. What you see in the first word is probably the result of both bugs. In the implementation of the kashida search were several mistakes. First I just noticed the differences to the results of the usp justification, but I thought that it might be intentionally. Then I found the MS documentation at http://www.microsoft.com/middleeast/msdn/JustifyingText-CSS.aspx, and the priorities in the kashida search seemed to match what the OOo code was trying to do. But the OOo code didn't compare the results of the different priority classes. It simply decided for the first match it found. If it found a match of the 3rd priority at the 2nd letter, it would not get to a match of 1st priority at the 3rd or 4th letter. Also it says: "If more than one connection opportunity have the same highest value, use the opportunity closest to the end of the word." As I said before, if it found a match, the game was over, not getting to the end of the word. And most seriously, there seemed to be a misunderstanding on the letters referenced in the priority classes. First of all, as far as I understand, the letter names refer to shaping groups as they are described in http://unicode.org/book/ch08.pdf in table 8.6. So you should actually check for a group of unicode chars. Where it states, that a kashida may be inserted before a final form of a letter, you have to keep in mind that the only-right-joining characters will have their "final form" also in the middle of a word. Of course, I could not be 100% sure of this interpretation, though it just seemed more logical. But after implementing the changes I just saw that everything matches exactly the way usp would place the kashidas, so I was assured that I got it right. In the longer document you can occasionally see the gaps inside the words in the before-patch version. But that is the other issue... fme->hennerdrewes: If I understood correctly, this patch is also related to issue 60594. To judge from your comments there's still some work left to do for issue 60594. Anyway, from my point of view this porlay.cxx patch makes sense, even without any further patches for other issues. If this is correct, I'll add this patch to cws fmepatches02. Please also attach the odt documents for kashida_before_patch3.pdf and kashida_after_patch3.pdt. Thank you. Created attachment 52448 [details]
Kashida document full text
Created attachment 52449 [details]
Kashida selected words document
I posted the requested documents. The patch can be applied separately from the other patches, of course. From the user's perspective, however, they are linked. Presently all Arabic (as Hebrew before 77679 fix) justification text output is so buggy that it cannot be used. I think it would be a really a good idea to target issue 60594 for OOo 3.0, so that this release will be fit to handle this. Otherwise as a user, I would expect a clear statement: OOo is not able to handle this at present. (not so nice) As I stated before, I am willing to help with that (need assistance though!) Further remarks: - Does this patch also need to be transfered to the edit engine? - The lcl_ConnectToPrev function could be a lot nicer with the isXXXChar functions. Simply check for the right joining classes. If true, return false, otherwise proceed. Also found some other issues with Arabic justification, but this will be in a separate issue (need some more time for that) fme->henner: Again, thank you for your contribution. I'll add this patch to cws fmepatches02 and asked Herbert (hdu) to have a look at issue 60594. Added patch to cws fmepatches02. fme->sba: Ready for QA. SBA: Verified in CWS fmepatches02 This issue is closed automatically and wasn't rechecked in a current version of OOo. The fixed issue should be integrated in OOo since more than half a year. If you think this issue isn't fixed in a current version (OOo 3.1), please reopen it and change the field 'Target Milestone' accordingly. If you want to download a current version of OOo => http://download.openoffice.org/index.html If you want to know more about the handling of fixed/verified issues => http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues |