Apache OpenOffice (AOO) Bugzilla – Issue 77976
Vista: Incorrect placing of Hebrew characters with justified text - approved
Last modified: 2013-08-07 14:43:00 UTC
When writing Hebrew text (RTL) with alignment "justified", spacing of the characters is incorrect. Instead of expanding the spacing between the words, very often spacing between the first and the second letter of a word is expanded. The phenomenon may be seen, especially when the line length is rather short (e.g. two column layout).
Created attachment 45547 [details] Sample document. See pages 7 and 8
Created attachment 45548 [details] Page 8 of this document as PDF (created by Export as PDF). In PDF export, some additional punctuation characters are placed incorrectly. This does not happen in regular printing.
Reassigned to SBA.
I checked this issue again with the OOo 2.3 on Windows Vista. The problem still exists as it can be seen in the previously uploaded file "BookOpening10_8.pdf" While editing the document the incorrect character spacing may be observed. However, Export as PDF, produces now correct character spacing. (see BookOpening12_8.pdf) Printing a PS File using the "HP Color Laser Jet 550 PS" printer driver still produces the faulty spacing. I converted the ps file to PDF via Ghostscript (BookOpening12_8ps.pdf) The problem may be specific to Windows Vista. When I opened the same document on a Windows 2000 system, I didn't notice any problems. Please react.
Created attachment 48602 [details] Page 8 of this document as PDF (created by Export as PDF, in OOo 2.3, seems correct)
Created attachment 48603 [details] Page 8 of this document as PDF (created by printing a PS file, converted to PDF by Ghostscript), OOo 2.3. The incorrect character spacing can be most clearly observed in line 8 of the first hebrew paragraph.
I was able to reproduce the problem when exporting to PDF on Windows XP. From what I see, this is only a problem in PDF export, not in the screen display. ->sba: Should I confirm the issue and change the status to "New"?
Screen display affected: Please check screen display again! Try turning on and off "Nonprinting charachters" Running OOo 2.3 on Windows 2000, the problem occurs in screen display when non-printing characters are turned on. Running OOo 2.2.1 on Windows Vista, the problem occurs in screen display when non-printing characters are turned off.
I can confirm that when running OOo 2.2 on Windows XP, the problem occurs in screen display when non-printing characters are turned on.
Why is the status left unconfirmed?
I found the cause of the problem. See attachments.
Created attachment 50443 [details] Patch for file sw\source\core\txtnode\fntcache.cxx
Created attachment 50444 [details] patch for file vcl\win\source\gdi\winlayout.cxx
Changing issue type. @hdu: something for 2.4?
@FME: please check hennerdrewes' attached fntcache.patch. Would it break things e.g. when showing nonprinting characters? The workaround in the fntcache.patch makes winlayout.cxx's manual right justification of glyphs superfluous and I'd be happy to remove the manual justification then. @hennerdrewes: did you check your patches with different versions of USP on XP,Vista and eventually W2K?
I checked the patch with Vista (really big improvement) and W2k (also fine). I don't have an XP machine to test. I tested my Hebrew documents with various OOo versions also on XP and W2k. As far as I could notice related to this issue, I assume that the USPs of XP and W2k behave the same. The patch also fixes character placement when using expanded or condensed character spacing (This was quite unusable with Hebrew before). The only negative side effect is, that non-printing characters for blanks are not centered (as in word underline). Maybe this could be improved, but I consider this rather low priority compared to the previous defect.
@hennerdrewes: Thanks for checking on W2K and WV! I'm waiting for FME's feedback on the Writer patch, but this will most likely not be till next week. In the meantime I'm wondering why the extra halfwidth space would make a difference if the statement is true, that "the glyph is always aligned with the leading edge of its cell". Since the fntcache.patch would no longer be be neccessary if it worked like specified, I'm wondering if there is a simpler way to correct wrong USP layouts.
@hdu: Apparently the Vista interpretaion of which side is the "leading edge" is different than it was on pre-Vista. Or it is yet another Vista bug. Anyhow, even on pre-Vista you could notice the following problems: 1. If I understood your code correctly, the layout class for PDF export is not USP based. I haven't examined everything in the code thoroughly enough to be 100% sure, but I haven't seen the right-align procedure over there. Fact is, that character placement in PDF export is incorrect (see confirmation of ayaniger) 2. When showing non-printing characters, a Hebrew text line is splitted into several items. Right-aligning the glyphs does not seem to work correctly here. You would have to take the adjacent items into account. As for an alternative for the fntcache.patch, I have been thinking of performing a right justification more selectively, but haven't come up with a striking idea. The fntcache.patch corrects all three problems (Vista-USP, PDF, and pre-Vista non-printing characters). An alternative solution probably would have to address the three problems separately.
@hennerdrewes: here are answers to your questions, some suggestions and more questions: 1. the layout class for PDF only handles the layout for the builtin PDF fonts. Please see vcl/source/gdi/base14.cxx about what these builtin fonts are. They don't support any Hebrew... so this issue shouldn't be affected by the PDFSalLayout::LayoutText() code... On the other hand, since mostly punctuation code points are affected that are also supported by the builtin fonts... does the resulting PDF contain a mixture of different fonts on the problematic lines? If yes, then the problem for justified hebrew PDF export could probably be solved by extending PDFSalLayout::LayoutText() to also do the right justification of glyphs in their cells. This should be done in a separate issue then, which would solve the problem across platforms. Could you check the extent of the problem also on non-Windows platforms? 2. when showing non-printing characters a space is divided into a middle-dot and a space. The cell widths are identical and the dot is supposed to stay in the middle of the original space. Maybe the middle-dot confuses the BiDi algorithm or the USP itemization? The two items above address the PDF problem and the non-printing-char problem. If they were solved would there be remaining problem? If I read the original problem report correctly there were also extra spaces within words? Is this still reproducable on recent version of OOo?
@hdu: to 1: From the visual output, I am not able to tell if e.g. a comma originates from built-in Helvetica vs. Arial used in the document. I have no access to non-Windows machines. Sorry. to 2: when I checked the layout process with the debugger, I noticed that ScriptItemize produces several Items, if I recall right one item per word and one per middle dot. If those two problems were solved, the strange behaviour of the Vista-USP would still cause a problem. The extra space between first to the following letters of a word occurs on all OOo Versions. Depending on the USP (Vista vs. Pre-Vista) it just occurs in different situations. I will have a closer look at some details, maybe I can tell you more during the next days.
Created attachment 50679 [details] patch proposal (draft)
more insights: - The vista issue (to be more specific, my findings relate to usp10.dll ver 1.626.6000.16386). ScriptTextOut actually performs the right-justification of glyphs when the fRTL member of SCRIPT_ANALYSIS parameter is set. So if you perform the manual justification, you should reset fRTL to zero before calling ScriptTextOut. If you want to rely on right-justification of ScriptTextOut, you should query the usp version and verify that the functionality is available. - The PDF and non-printing-characters issue relate to the same cause. The Hebrew text is itemized into single words and spaces. To make the right-justification meaningful, you need to perform the justification across items. (see attached patch) The patch corrects the character output in most cases, but when different fonts or font styles are mixed in the text line, you still have some misalignments. Especially expanding inter-character spacing breaks things apart. Compared to this manual justification, the justification of ScriptTextOut is more reliable, so maybe it should be used where available. - Another thing I noticed, maybe it is not relevant to OOo: When I pass a Hebrew text (characters and blanks) to ScriptShape, the Vista usp returns uJustification==SCRIPT_JUSTIFY_CHARACTER for all glyphs, even for the blanks. (except if I used the Arial Unicode MS font) This does not happen on older usps. This affects the output of ScriptJustify, I don't know if it is used by OOo at all.
fme->hdu: I'll attach a smaller bugdoc. The fact, that the placement is only incorrect in case the non-printing characters are shown, let's me think that we should not touch fntcache.cxx.
Created attachment 50698 [details] new smaller bugdoc
Created attachment 50703 [details] patch for file vcl\win\source\gdi\winlayout.cxx
Please disregard previous patch proposal (Jan 6)
@hennerdrewes: thanks for debugging into this, your insights and patches! Is our PDF export impacted when we get rid of manual glyph justification for newer usp versions? I'd be surprised if it was not so, because the glyph adjustment wouldn't be done during the layout phase. If the answer is yes, then UniscribeLayout::GetNextGlyphs() needs to be changed to right align the position of these glyphs. The ScriptJustify() API behaviour is very relevant for the OOo. Thanks to your analysis it seems that checking for (a.fRTL && uJustification==SCRIPT_JUSTIFY_CHARACTER) would identify the glyphs that need that treatment? And vice versa could this check help to identify the glyphs that need to be left alone because they are intended for use in ScriptTextOut()? Then I'd prefer that instead of reacting on the USP dll version. Other than that I'd be happy to apply your patch. Did it fix all the problem symptoms with non- printable chars, the extra space within words, punctuation positions, etc.?
@hdu: PDF Export: You are right, it is impacted. Otherwise, when using the manual justification the patch now corrects most of the original problems described in the issue. One problem remains: ...when different fonts or font styles are mixed in the text line, you still have some misalignments. Especially expanding inter-character spacing breaks things apart. I'll attach a (slightly larger) bugdoc, which clearly demonstrates the problem. When viewing the document, switching non-printing characters repeatedly on and off, will move misaligned items back and forth. I am right-aligning the first glyph in an item by adjusting VisualItem.mnXOffset. However, if we are switching fonts, probably something else has to be adjusted, which I haven't figured out. If you have any clue, where this ought to be done, I'd be happy to try to correct that, too.
Created attachment 50718 [details] new expanded bugdoc
comments to bugdoc hebtest.odt: after applying the proposed patch paragraphs 1. and 3. display correctly (showing and not-showing non-printing characters and PDF) In paragraphs 2 and 4 the words with expanded character spacing are misaligned when not-showing non-printing characters. When showing non-printing characters, they apparently move to correct positions. Paragraphs 2 and 4 also display correctly with newer usp version right align in ScriptTextOut. As stated before, PDF export is pretty much garbled.
@hdu: > "...checking for (a.fRTL && uJustification==SCRIPT_JUSTIFY_CHARACTER) would > identify the glyphs that need that treatment?" I think I didn't quite get you here. My original idea was to perform a more selective right alignment to glyphs by examining uJustification. But it turned out, that the newer usp does not identify the blanks. If you pass this information to ScriptJustify, space is added to all characters evenly, and not to only the blanks as you would expect.
Created attachment 50781 [details] patch for file vcl\win\source\gdi\winlayout.cxx
Posted another patch. It now also fixes the remaining problem, when using expanded character spacing. My previously posted bugdoc displays and prints now correctly in all display modes and PDF export.
Thanks for the updated patch! I noticed the missing nMinGlyphPos in the older patch, too. What is the story with the USE_USP1_6_RTLALIGN define? Do you happen to know if the patch also works for kashida-justied arabic script? I'm trying to come up with a change that wouldn't need the a.fRTL=false hack...
I included the deactivated USE_USP1_6_RTLALIGN define to outline the variations I was experimenting with, using the right-align capability of ScriptTextOut in the vista usp. Of course this is not ready to use, as you would need to verify the usp version and fix the impacted PDF export. Just in case you want to use this feature in the future, the define marks the lines in the code where changes would be necessary. Please remove if not needed. As I don't read any Arabic, It is hard for me to check the influence on that. But as far as I understand: If in the output of Arabic script, expanded cell widths are provided by mpJustifications, the fixed right-justification procedure should also be applied to that.
Created attachment 50789 [details] patch suggestion
Thanks for analyzing the root problem and identifying the USP behaviour change as the culprit. The analysis made it clear that the "manual glyph alignment" we had to do was a workaround for missing functionality in older USP versions. I like the a.fRTL=false hack, because it is minimally invasive against the code base. On the other hand having to rely on a hack to keep an obsoleted workaround working would probably make maintenance in the future more difficult. So I've rewritten your patch to enable/disable the workaround depending on the USP version and get rid of the hack altogether. Please review the patch I attached above. Does it still fix all the symptoms?
Just had a quick glance at the patch. Questions: 1. You haven't related to the problem in PDF export, when using usp's right align. 2. Did you verify, exactly which usp version (out of the many) behaves in which way? If there is any doubt in that, it would be safer to leave the a.fRTL=false hack active, when bManualCellAlign = true. This way you can make sure, that right aligning does not happen twice. I still have some thoughts on possible impact for Arabic justification, but I'll have to think about that a little longer...
Created attachment 50824 [details] patch changes
@hdu: After applying your patch, everything seems to be working fine with the Hebrew output, except for the PDF, as stated before. What solution do you have in mind for that? I also tried to have a look at Arabic justification. (Emphasis is on "tried", as unfortunately I cannot be sure on my findings, as I don't read the language). But nevertheless, I encountered some phenomena, which were mentioned in some issues on Arabic text justification. 1. Words seem to break apart. Cause: a glyph cell gets expanded by value less than the iKashidaWidth value returned by ScriptGetFontProperties. You should call ScriptGetFontProperties in ApplyDxArray and correct mpJustification Values if necessary. 2. Kashidas are appended to words, instead of connecting letters within the words. Cause: The OOo Kashida algorithm decides to expand characters, which receive a SCRIPT_JUSTIFY_NONE uJustification Value by ScriptShape. I attached a patch, where I replaced the SCRIPT_JUSTIFY_NONE, with SCRIPT_JUSTIFY_KASHIDA. This fixes the problem. But maybe important to note: OOo puts Kashidas at different places than USP. I don't know how serious that is regarding Arabic typography, but maybe you should check into that, too. There was this previous hack replacing uJustification values with SCRIPT_JUSTIFY_KASHIDA. Never understood that really, but if it was aimed at this problem, it should concentrate (or at least include) SCRIPT_JUSTIFY_NONE.
Thanks for checking that it works for Hebrew. To get PDF export right in the new scheme of things UniscribeLayout::GetNextGlyphs() needs to adjust the glyph positions. I'll do it soon. The kashida justification in OOo is decided by sw/source/core/text/porlay.cxx. Unfortunately this is a one way decision which doesn't consult the suggestion by the platform's layout engine. Probably the inconsistencies are caused by this. VCL overriding SW's placement of kashidas would cause many problems with cursor movement, background colors, text measurements etc. Looking back I don't remember the exact details of replacing some uJustification values with kashidas either :-/ I could find out with a long investigation though. It was probably massaged to work with several old USP versions. I'd be happy to get rid of this hack entirely for new enough USPs and made it conditional to bManualCellAlign. Your suggestion to replace just SCRIPT_JUSTIFY_NONE sounds good, did you try it on both newer and older USP versions?
As far as I could see new and old USP behave pretty much the same regarding Arabic. They return different suggestions for uJustification, though. But I will double check again. You should respect the minimal kashida width in some respect. Either it should be done in sw, or compensate in a way, that just minimally disrupts sw's character placing. Like remove the kashida width (which is small anyway) from the glyph and add it to the following space.
Retargeting to next release after 2.4, which is on its way.
Created attachment 51331 [details] corrections to patch
@hdu: some corrections to my previous patch, now checked on XP and Vista. The changes are relevant for the kashida issue on older usps.
Thanks! This seems to be ready to get into mainline soon...
Applied the patch to CWS vcl87. Need to adjust UniscribeLayout::GetNextGlyphs() to adjust PDF export to the changes.
Now also added the change mentioned above for UniscribeLayout::GetNextGlyphs() (winlayout.cxx 1.108.206.2)
@sba: please verify in CWS vcl87 Testing hints: - load the bugdoc on XP and Vista - check the display, the printout and PDF export on each platform
reassigning for verification
Verified in CWS vcl87.
Any chance to see this fixed also in 2.4.1? (we already have the fix for the problem, so that should take much time).
SBA-> Kaplan: I just talked to HDU and I got the approval from MH to get this fixed for 2.4.1. too. So after re-verification in OOo 3.0 Beta master build, I will set the target accordingly and reassign this one to HDU.
kaplan -> SBA, HDU: Cool (: Thanks for the extra effort on this.
Will do. @hennerdrewes: Have you had a chance to check these changes in winlayout.cxx DEV300_m6? My plan is to pull this revision into 2.4.1.
@hdu: Haven't had time to check with winlayout.cxx DEV300_m6 yet. I built OOo 2.4 with winlayout.cxx 1.108.206.2 and have been using that for quite a while, and that works fine.
Reopening for the backport to 2.4.1
Applied the fix into CWS rtl4vista for OOo 2.4.1 Winlayout.cxx's revision 1.108.206.2 was an even match better for OOo 2.4.1. Thanks for the feedback!
@sba: please verify in CWS rtl4vista too
Verified in DEV300_m6 on Windows Vista. Thus OK for OOo 3.0 Beta. Not setting to verified because fix must be re-verified in 2.4.1 CWS rtl4vista. Target set to OOo 2.4.1
SBA: Verified in 2.4.1 CWS rtl4vista.
added "approved" to the title, because it will be easier to work with the 2.4.1 meta issue during release status meetings.
Automation possible.
*** Issue 86170 has been marked as a duplicate of this issue. ***
OK in OOO310_m10. Closed.