Apache OpenOffice (AOO) Bugzilla – Issue 37190
newutral characters move on import from powerpoint
Last modified: 2013-08-07 15:20:06 UTC
When opening the attached file in impress, the parentasis in slide 4 (in the sub heading) move to the wrong location. Both in Impress and in PowerPoint that area is set as LTR, however, PowerPoint displayes it as RTL.
Created attachment 19326 [details] PowerPoint file
Created attachment 19327 [details] Display of slide 4 in PowerPoint 2000
Created attachment 19524 [details] Another example- compare the qutes in the slide titles between PPT and OOo
Created attachment 19527 [details] 3rd example- see slide #4 and #10 (the botton question mark)
Created attachment 19529 [details] another example- see slide #5
DL-> WG: Could you please handle this?
Reproducible. The parentesis is displayed outside of the brackets.
Reassigned.
sj->ama: This seems to be a problem the outliner is having with weak characters, if the text is pasted from Impress into Writer the parenthesis are correctly displayed.
I've worked around the outliner problem by using the method proposed in Issue 18024, namely by adding an RLM after Hebrew paragraphs that end in a weak character. This solves almost all the problems when importing the attached presentations. Two problems that are not solved seem to stem from old PPT formats: 1) in 2002a.ppt, slide 5, the numbers prefixing each line are on the right in PPT and on the left in OOo. If I load 2002a.ppt into PPT 2002, I see that slide 5 has its text direction as LTR and the alignment to the right. If you select the text, then change to text direction to RTL, the display is the same. Change it back to LTR, the display changes. Save it and load it into OOo, and the display is the same as in PPT 2002. Apparently saving the same content in PPT 2002 causes changes in the format, and OOo handles the new format correctly. 2) in radiohercog.ppt, slide 2, the periods ending the RTL sentences on the bottom appear on the left in PPT, and on the right in OOo. Here again, I loaded the file into PPT 2002, made a change to the text in the upper half of the screen, which is a different text object altogether, then saved the file. When I loaded the newly saved file into OOo, it was imported correctly, including the text object on the bottom, to which I had made no changes. Apparently, the problem stemmed from the old format. All other text in all the attached presentations looks fine, when using this method. I'm attaching a patch, which is not admittedly very pretty, and specialized only for Hebrew. Perhaps it can help someone can pick up the ball at this point and either solve the problem in the outliner, or at least solve it during PPT import, but in a more generalized way that would work for other RTL languages as well.
Created attachment 40139 [details] Tacks an RLM after weak chars ending a Hebrew paragraph
AMA->FME: Please have a look...
FME: i50657 also deals with bidi in EditEngine. Looks like the bidi algorithm used in the EditEngine differs from the one used in Writer ;-) I'll try to fix these issues for 2.x.
ayaniger->fme(and maybe sj(?)): Until the edit engine changes are made, I'll be using the patch attached here to deal with ending weak chars by adding and RLM at the end. I have a question about a related problem. Sometimes I also have to add an RLM at the beginning, such as when a Hebrew string is preceded by weak chars. However, this causes a seperate problem. The string length has been increased by one because of the inserted RLM, say from 20 to 21, but the font attributes are applied only to 20 chars. As a result, the final char in the string appears in a different font. I tried unsuccesfully to increase "nCharCount" in PPTStyleTextPropReader::PPTStyleTextPropReader according the the extra chars that I inserted. Since that didn't work, how can I overcome this problem?
fme->alan: Let's not spend too much time on working around the root cause. The new attached document shows the problem: The editengine clearly has a bug in its bidi code. I'm currently investigating the editengine code, I hope I can get back to you soon with a fix for this.
fme->alan: Please have a look at svx/source/editeng/impedit2.cxx. In ImplInitLayoutMode, the results for bR2L are definitely wrong (and therefore the layout modes set at the ountput device). For my new attached document we should get this: nIndex = 0 => bR2L = 0 nIndex = 1 => bR2L = 1 nIndex = 14 => bR2L = 0 So I had a look into ImpEditEngine::GetRightToLeft( ... ) and changed rDirInfos[n].nEndPos >= nPos to rDirInfos[n].nEndPos > nPos. Now the layout mode was set correctly. Unfortunately ImpEditEngine::GetRightToLeft( ... ) is also called by other clients, which seem to rely on >=, therefore I changed the call of GetRightToLeft( nPara, nIndex ) in ImplInitLayoutMode to GetRightToLeft( nPara, nIndex + 1 ). This seems to work with my attached sample document, but there is still an issue with slide 4 in radiohercog.ppt. So obviously this needs some deeper digging.
Created attachment 40650 [details] new bugdoc (Wirter vs. EditEngine)
FME: Having a second look at the bugdocs, I think we actually have two different problems: 1. The EditEngine is a bit buggy dealing with bidi text 2. Word's bidi algorithm differs from the one used by OOo (ICU) 2. clearly has to be fixed (or better say: work around) in the import filter, so alan, please contact sj for the RLM stuff, ppt import is more his playground). I need to get an overview of all currently open bidi text formatting issues, both Writer and EditEngine. Looks like this can be quite time, but I hope I can do this for one of the upcoming versions.
ayaniger->fme: I'm looking now at the edit engine code and your fix. I will contact you directly with an overview of outstanding bidi issues.
set target from 2.x to 3.x according to http://wiki.services.openoffice.org/wiki/Target_3x
@fme: your proposed change to the edit engine from 17.11.2006 code works and should be integrated. The remaining problem on slide 4 of radiohercog.ppt has a different cause. The two lines in the text field a separated by only a line break, which is treated by the bidi algorithm in the same way as a space character. If you replace the line break with a space character, even Word places the parentheses in this strange way. But after all, this is a LTR paragraph... To summarize, there is a remaining difference. If I write "Hebrew letters" "New Paragraph" "(" "Hebrew letters" ")": Output is ok in writer and edit engine (with patch) "Hebrew letters" "Line Break" "(" "Hebrew letters" ")": Output is ok in writer, buggy as before in edit engine "Hebrew letters" "Space" "(" "Hebrew letters" ")": Output all weird (even in Word)
fme->hennerdrewes: Thank you for your analysis. fme->tl: EditEngine is your baby now.
ayaniger->fme,tl: How do things stand with fixing bidi issues in the edit engine? Thanks, Alan
In m15, the quote marks in the title of the document "matzeget" are ok. However, all the other bugs described in the this issue are still there.