Issue 37190 - newutral characters move on import from powerpoint
Summary: newutral characters move on import from powerpoint
Status: CONFIRMED
Alias: None
Product: Impress
Classification: Application
Component: open-import (show other issues)
Version: 680m115
Hardware: All All
: P3 Trivial with 5 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: ms_interoperability
Depends on:
Blocks:
 
Reported: 2004-11-15 16:57 UTC by sforbes
Modified: 2013-08-07 15:20 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
PowerPoint file (28.50 KB, application/vnd.ms-powerpoint)
2004-11-15 16:58 UTC, sforbes
no flags Details
Display of slide 4 in PowerPoint 2000 (27.16 KB, image/png)
2004-11-15 16:58 UTC, sforbes
no flags Details
Another example- compare the qutes in the slide titles between PPT and OOo (88.50 KB, application/vnd.ms-powerpoint)
2004-11-21 04:52 UTC, sforbes
no flags Details
3rd example- see slide #4 and #10 (the botton question mark) (93.50 KB, application/vnd.ms-powerpoint)
2004-11-21 05:14 UTC, sforbes
no flags Details
another example- see slide #5 (381.00 KB, application/vnd.ms-powerpoint)
2004-11-21 05:36 UTC, sforbes
no flags Details
Tacks an RLM after weak chars ending a Hebrew paragraph (4.46 KB, patch)
2006-10-29 13:26 UTC, alan
no flags Details | Diff
new bugdoc (Wirter vs. EditEngine) (8.45 KB, application/vnd.oasis.opendocument.text)
2006-11-17 10:43 UTC, frank.meies
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description sforbes 2004-11-15 16:57:37 UTC
When opening the attached file in impress, the parentasis in slide 4 (in the sub
heading) move to the wrong location.

Both in Impress and in PowerPoint that area is set as LTR, however, PowerPoint
displayes it as RTL.
Comment 1 sforbes 2004-11-15 16:58:13 UTC
Created attachment 19326 [details]
PowerPoint file
Comment 2 sforbes 2004-11-15 16:58:46 UTC
Created attachment 19327 [details]
Display of slide 4 in PowerPoint 2000
Comment 3 sforbes 2004-11-21 04:52:41 UTC
Created attachment 19524 [details]
Another example- compare the qutes in the slide titles between PPT and OOo
Comment 4 sforbes 2004-11-21 05:14:05 UTC
Created attachment 19527 [details]
3rd example- see slide #4 and #10 (the botton question mark)
Comment 5 sforbes 2004-11-21 05:36:05 UTC
Created attachment 19529 [details]
another example- see slide #5
Comment 6 Dieter.Loeschky 2005-07-11 08:10:37 UTC
DL-> WG: Could you please handle this?
Comment 7 wolframgarten 2005-07-11 08:34:04 UTC
Reproducible.  The parentesis is displayed outside of the brackets.
Comment 8 wolframgarten 2005-07-11 08:34:35 UTC
Reassigned.
Comment 9 sven.jacobi 2005-10-31 14:59:02 UTC
sj->ama: This seems to be a problem the outliner is having with weak characters,
if the text is pasted from Impress into Writer the parenthesis are correctly
displayed. 
Comment 10 alan 2006-10-29 13:24:55 UTC
I've worked around the outliner problem by using the method proposed in Issue
18024, namely by adding an RLM after Hebrew paragraphs that end in a weak
character. This solves almost all the problems when importing the attached
presentations.

Two problems that are not solved seem to stem from old PPT formats:
1) in 2002a.ppt, slide 5, the numbers prefixing each line are on the right in
PPT and on the left in OOo. If I load 2002a.ppt into PPT 2002, I see that slide
5 has its text direction as LTR and the alignment to the right. If you select
the text, then change to text direction to RTL, the display is the same. Change
it  back to LTR, the display changes. Save it and load it into OOo, and the
display is the same as in PPT 2002. Apparently saving the same content in PPT
2002 causes changes in the format, and OOo handles the new format correctly. 

2) in radiohercog.ppt, slide 2, the periods ending the RTL sentences on the
bottom appear on the left in PPT, and on the right in OOo. Here again, I loaded
the file into PPT 2002, made a change to the text in the upper half of the
screen, which is a different text object altogether, then saved the file. When I
loaded the newly saved file into OOo, it was imported correctly, including the
text object on the bottom, to which I had made no changes. Apparently, the
problem stemmed from the old format.

All other text in all the attached presentations looks fine, when using this method.

I'm attaching a patch, which is not admittedly very pretty, and specialized only
for Hebrew. Perhaps it can help someone can pick up the ball at this point and
either solve the problem in the outliner, or at least solve it during PPT
import, but in a more generalized way that would work for other RTL languages as
well.
Comment 11 alan 2006-10-29 13:26:48 UTC
Created attachment 40139 [details]
Tacks an RLM after weak chars ending a Hebrew paragraph
Comment 12 andreas.martens 2006-10-31 08:38:55 UTC
AMA->FME: Please have a look...
Comment 13 frank.meies 2006-10-31 10:11:58 UTC
FME: i50657 also deals with bidi in EditEngine. Looks like the bidi algorithm
used in the EditEngine differs from the one used in Writer ;-) I'll try to fix
these issues for 2.x.
Comment 14 alan 2006-11-17 08:32:57 UTC
ayaniger->fme(and maybe sj(?)):
Until the edit engine changes are made, I'll be using the patch attached here to
deal with ending weak chars by adding and RLM at the end. 
I have a question about a related problem. Sometimes I also have to add an RLM
at the beginning, such as when a Hebrew string is preceded by weak chars.
However, this causes a seperate problem. The string length has been increased by
one because of the inserted RLM, say from 20 to 21, but the font attributes are
applied only to 20 chars. As a result, the final char in the string appears in a
different font. I tried unsuccesfully to increase "nCharCount" in
PPTStyleTextPropReader::PPTStyleTextPropReader according the the extra chars
that I inserted. Since that didn't work, how can I overcome this problem?
Comment 15 frank.meies 2006-11-17 09:16:22 UTC
fme->alan: Let's not spend too much time on working around the root cause. The
new attached document shows the problem: The editengine clearly has a bug in its
bidi code. I'm currently investigating the editengine code, I hope I can get
back to you soon with a fix for this.
Comment 16 frank.meies 2006-11-17 10:42:14 UTC
fme->alan: Please have a look at svx/source/editeng/impedit2.cxx. In
ImplInitLayoutMode, the results for bR2L are definitely wrong (and therefore the
layout modes set at the ountput device). For my new attached document we should
get this:

nIndex = 0 => bR2L = 0
nIndex = 1 => bR2L = 1
nIndex = 14 => bR2L = 0

So I had a look into ImpEditEngine::GetRightToLeft( ... ) and changed
rDirInfos[n].nEndPos >= nPos to rDirInfos[n].nEndPos > nPos. Now the layout mode
was set correctly. Unfortunately ImpEditEngine::GetRightToLeft( ... ) is also
called by other clients, which seem to rely on >=, therefore I changed the call
of GetRightToLeft( nPara, nIndex ) in ImplInitLayoutMode to GetRightToLeft(
nPara, nIndex + 1 ). This seems to work with my attached sample document, but
there is still an issue with slide 4 in radiohercog.ppt. So obviously this needs
some deeper digging. 




Comment 17 frank.meies 2006-11-17 10:43:26 UTC
Created attachment 40650 [details]
new bugdoc (Wirter vs. EditEngine)
Comment 18 frank.meies 2006-11-17 11:18:58 UTC
FME: Having a second look at the bugdocs, I think we actually have two different
problems:

1. The EditEngine is a bit buggy dealing with bidi text
2. Word's bidi algorithm differs from the one used by OOo (ICU)

2. clearly has to be fixed (or better say: work around) in the import filter, so
alan, please contact sj for the RLM stuff, ppt import is more his playground).

I need to get an overview of all currently open bidi text formatting issues,
both Writer and EditEngine. Looks like this can be quite time, but I hope I can
do this for one of the upcoming versions.


Comment 19 alan 2006-11-17 11:57:53 UTC
ayaniger->fme:
I'm looking now at the edit engine code and your fix. I will contact you
directly with an overview of outstanding bidi issues.
Comment 20 Martin Hollmichel 2007-11-09 17:18:57 UTC
set target from 2.x to 3.x according to
http://wiki.services.openoffice.org/wiki/Target_3x
Comment 21 hennerdrewes 2008-03-30 10:20:00 UTC
@fme: your proposed change to the edit engine from 17.11.2006 code works and
should be integrated. 

The remaining problem on slide 4 of radiohercog.ppt has a different cause. 
The two lines in the text field a separated by only a line break, which is
treated by the bidi algorithm in the same way as a space character. If you
replace the line break with a space character, even Word places the parentheses
in this strange way. But after all, this is a LTR paragraph...

To summarize, there is a remaining difference. If I write

"Hebrew letters" "New Paragraph" "(" "Hebrew letters" ")": Output is ok in
writer and edit engine (with patch)

"Hebrew letters" "Line Break" "(" "Hebrew letters" ")": Output is ok in writer,
buggy as before in edit engine

"Hebrew letters" "Space" "(" "Hebrew letters" ")": Output all weird (even in Word)

Comment 22 frank.meies 2008-03-31 08:33:27 UTC
fme->hennerdrewes: Thank you for your analysis.

fme->tl: EditEngine is your baby now.
Comment 23 alan 2008-10-24 11:47:37 UTC
ayaniger->fme,tl:
How do things stand with fixing bidi issues in the edit engine?

Thanks,
Alan
Comment 24 alan 2009-07-31 06:36:45 UTC
In m15, the quote marks in the title of the document "matzeget" are ok. However,
all the other bugs described in the this issue are still there.