29165 – Word break different from this in MS Word

Issue 29165 - Word break different from this in MS Word

Summary: Word break different from this in MS Word

Status:	CLOSED WONT_FIX

Alias:	None

Product:	App Dev
Classification:	Unclassified
Component:	api (show other issues)
Version:	3.3.0 or older (OOo)
Hardware:	All All

Importance:	P3 Trivial
Target Milestone:	---
Assignee:	stefan.baltzer
QA Contact:	issues@api

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-05-14 05:28 UTC by pajowett2
Modified:	2013-02-24 21:09 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Java src and template (72.23 KB, application/octet-stream) 2004-05-31 07:59 UTC, pajowett2	no flags	Details
Batch script to compile and run in Windows. (76.16 KB, application/x-compressed) 2004-06-02 02:47 UTC, pajowett2	no flags	Details
Page displays differently in MSWord vs Writer (109.50 KB, application/msword) 2004-06-03 04:23 UTC, pajowett2	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description pajowett2 2004-05-14 05:28:00 UTC

Using the Java API to modify documents, timing problems become apparent if the 
document is saved on completion of the modifications.  Two cases are inserting 
table rows and updating TOC indexes.  Table insertions can result in rows with 
no data and index updates indexes refer to incorrect pages.

This has been evaluated on WindowsXP and Solaris 5.8.

The two API calls most likely to be involved are:
  XDocumentIndex.update()
  XTableRows.insertByIndex()

Unless the program performs a sleep for some arbitrary duration before the save 
or the call to update indexes, data is lost or incorrect in the saved document.

Changing the connection to be synchronous (Negotiate=0;forceSynchronous=1) 
doesn't correct the problem.  The document API either needs to ensure changes 
aren't being performed asynchronously or provide a mechanism to inform the API 
programs that all pending/asynchronous changes have completed.

Links to forum discussions of the problem:
http://www.oooforum.org/forum/viewtopic.php?t=7322
http://www.oooforum.org/forum/viewtopic.php?t=7826

Comment 1 stephan.wunderlich 2004-05-19 15:08:26 UTC

SW->pajowett2: couldn't reproduce the described behaviour with my OOo1.1.1. What
I did was loading a document with a toc and then added some paragraphs, with
headings that should be in the toc. Then I called XDocumentIndex.update()
followed by XStorable.storeToUrl ... the stored file containes the updated toc
... anything else I need to do to reproduce what you decribe ? ... Do you have
java-code and a corresponding document to reproduce this ?

Comment 2 pajowett2 2004-05-21 02:06:37 UTC

pajowett2->SW: Thanks.  There is a large amount of code surrounding this case, 
but I will produce a minimal case if required.  Can you please try this 
sequence which I suspect will show the problem:
1) create a 16 page MSWord97 doc with a TOC
2) load the doc into star office
3) run several XTextCursor.setString("") to reduce the number of pages down to 8
4) immediately call XDocumentIndex.update() and then storeToURL() as Word97 
format

I have since found that calling XRefreshable.refresh() on the document before 
doing the XDocumentIndex.update() fixes the problem (although I can't find any 
documentation to indicate why).

Comment 3 stephan.wunderlich 2004-05-27 11:48:20 UTC

SW->pajowett2: I tried your suggestion and removed paragraphs from one to all of
them and saved directly after update, but in none of the scenarios I could
reproduce the bavaiour you describe :-( ... do you by any chance have a document
and a java-program that demonstrates the behaviour you describe ?

Comment 4 pajowett2 2004-05-31 07:59:23 UTC

Created attachment 15577 [details]
Java src and template

Comment 5 pajowett2 2004-05-31 08:03:20 UTC

I have attached a java class and template which can reproduce the problem 
reliably.  I have tested using the latest patched StarOffice7 under WinXP and 
Solaris.  See the main() routine to get the idea of what is happening.  Three 
copies of the template are produced and most of the time, opening one of these 
copies shows the TOC is incorrect.  The XRefreshable().refresh() call in the 
updateIndexes() method should be commented out (I think I've accidently left it 
in).

Comment 6 stephan.wunderlich 2004-06-01 11:36:54 UTC

SW->pajowett2 : strange ... I commented the "refresh" line out and ran your
program with OOo1.1, OOo1.1.1 and StarOffice 7 Product Update 2 ... in all cases
I got documents with indexes that are uptodate ... any ideas what I might miss ?

Comment 7 pajowett2 2004-06-02 02:47:33 UTC

Created attachment 15629 [details]
Batch script to compile and run in Windows.

Comment 8 pajowett2 2004-06-02 02:56:02 UTC

pajowett2->SW:  Thanks again.  I asked another developer to try it for me 
(WinXP OpenOffice1.1.0, JDK 1.3) and he confirmed that he sees the same 
problem.  What operating system did you run your tests on? What did you use to 
view the resulting .doc files?  I should probably explain that the test case I 
sent you RANDOMLY removes sections from the template.doc file so that the TOC 
is expected to be different in different runs.  The fastest way to see if it is 
working is to open each of the resulting documents (3 files in the conv 
directory), find the TOC and update it (right click->update).  If the numbers 
change when you do this, the problem is revealed!

I have attached another zip file which contains the exact run case.  If you 
extract it to c:\temp on a windows platform, update the variables at the top of 
the run.bat script (to match your hostname, JDK and Office) the execute 
run.bat, it will compile and run the test case.  The 3 docs produced in conv 
almost always exhibit the problem.  Please run it a few times to confirm since 
the first time I ran it this morning, all 3 documents were fine, but every run 
since, at least one has an incorrect TOC.

Comment 9 pajowett2 2004-06-02 03:21:16 UTC

pajowett2->SW: I just noticed that if I open one of the documents with the bad 
TOC into StarWriter, the TOC matches the document!  This means that the 
pagination for the exported document is different - and sure enough, in 
StarWriter, the doc displays with 1 more page than in MSWord.  I'm still 
looking as to why but it looks like a combination of page height and word 
wrapping.

This can't be acceptable since exporting to PDF shows exactly the same problem 
and the TOC can never be corrected.

Comment 10 pajowett2 2004-06-02 06:34:36 UTC

pajowett2->SW: Problem is down to the fact that MSWord and StarOffice view the 
document differently.  Please ignore my PDF export comment above, since the PDF 
export always matches the StarOffice view (which matches the TOC).  Word is 
breaking the text differently to StarOffice and resulting in more or fewer 
lines and hence different page counts.  Are you aware of any settings to 
control the way words are broken?

Comment 11 stephan.wunderlich 2004-06-02 09:01:39 UTC

SW->pajowett2: that the views look different and hence the page-count might
differ results from the fact that OOo1.x and StarOffice 7 use Printer-metrics of
the used printer to render the text and Word uses a virtual device. OOo2.0 will
use a virtual device too and then the results should be more equal :-) ... will
your script discover additional different behaviour or is the different
page-count already what you noticed as bug ?

Comment 12 pajowett2 2004-06-03 04:22:32 UTC

pajowett2->SW: Thanks SW.  I'm not sure what your question means:" will
your script discover additional different behaviour or is the different
page-count already what you noticed as bug ?".  I didn't think Virtual Device 
vs Print Metrics would affect where words are broken.  I will attach a 1 page 
MSWord document which shows the problem.

Comment 13 pajowett2 2004-06-03 04:23:17 UTC

Created attachment 15654 [details]
Page displays differently in MSWord vs Writer

Comment 14 stephan.wunderlich 2004-06-09 16:38:17 UTC

SW->MRU: the last added attachment show a difference in hyphenation between the
way the document looks in word and the way it look in OOo-writer ... looks like
a word-import issue, please have a look at this.

Comment 15 michael.ruess 2004-06-17 15:48:06 UTC

Not an import issue (and also not hyphenation)... The break iterator just is a
bit defferent then this from Word (best be seen in the table on page 4).
MRU->SBA: Will there anything be changed?

Comment 16 stefan.baltzer 2005-01-07 17:09:52 UTC

SBA: I had a look with OOo 1.9.70. The fact that it is DIFFERENT from MS Word
does not mean that OOo does wrong. Example (from Page 4, First table row):

".....DETAILS.TITLE-] [-PERSONAL_......"
 - MS Word breaks between [-" and "PERSONAL" (at the end of the line)
 - OOo breaks between "]" and "[" because there is a space.

IMHO a line break at this point makes much more sense. Being different SOMETIMES
means being better IMHO. Anyway, to clone the break iterator behavior in order
to behave the same is far too costly compared to the problem of "not blindly
photo-copying".

Set to "Wontfix".

Comment 17 stefan.baltzer 2005-01-07 17:10:54 UTC

SBA: Closed. 
Please don't reopen unless you find an OOo developer willing to deal with this.