Apache OpenOffice (AOO) Bugzilla – Issue 29165
Word break different from this in MS Word
Last modified: 2013-02-24 21:09:39 UTC
Using the Java API to modify documents, timing problems become apparent if the document is saved on completion of the modifications. Two cases are inserting table rows and updating TOC indexes. Table insertions can result in rows with no data and index updates indexes refer to incorrect pages. This has been evaluated on WindowsXP and Solaris 5.8. The two API calls most likely to be involved are: XDocumentIndex.update() XTableRows.insertByIndex() Unless the program performs a sleep for some arbitrary duration before the save or the call to update indexes, data is lost or incorrect in the saved document. Changing the connection to be synchronous (Negotiate=0;forceSynchronous=1) doesn't correct the problem. The document API either needs to ensure changes aren't being performed asynchronously or provide a mechanism to inform the API programs that all pending/asynchronous changes have completed. Links to forum discussions of the problem: http://www.oooforum.org/forum/viewtopic.php?t=7322 http://www.oooforum.org/forum/viewtopic.php?t=7826
SW->pajowett2: couldn't reproduce the described behaviour with my OOo1.1.1. What I did was loading a document with a toc and then added some paragraphs, with headings that should be in the toc. Then I called XDocumentIndex.update() followed by XStorable.storeToUrl ... the stored file containes the updated toc ... anything else I need to do to reproduce what you decribe ? ... Do you have java-code and a corresponding document to reproduce this ?
pajowett2->SW: Thanks. There is a large amount of code surrounding this case, but I will produce a minimal case if required. Can you please try this sequence which I suspect will show the problem: 1) create a 16 page MSWord97 doc with a TOC 2) load the doc into star office 3) run several XTextCursor.setString("") to reduce the number of pages down to 8 4) immediately call XDocumentIndex.update() and then storeToURL() as Word97 format I have since found that calling XRefreshable.refresh() on the document before doing the XDocumentIndex.update() fixes the problem (although I can't find any documentation to indicate why).
SW->pajowett2: I tried your suggestion and removed paragraphs from one to all of them and saved directly after update, but in none of the scenarios I could reproduce the bavaiour you describe :-( ... do you by any chance have a document and a java-program that demonstrates the behaviour you describe ?
Created attachment 15577 [details] Java src and template
I have attached a java class and template which can reproduce the problem reliably. I have tested using the latest patched StarOffice7 under WinXP and Solaris. See the main() routine to get the idea of what is happening. Three copies of the template are produced and most of the time, opening one of these copies shows the TOC is incorrect. The XRefreshable().refresh() call in the updateIndexes() method should be commented out (I think I've accidently left it in).
SW->pajowett2 : strange ... I commented the "refresh" line out and ran your program with OOo1.1, OOo1.1.1 and StarOffice 7 Product Update 2 ... in all cases I got documents with indexes that are uptodate ... any ideas what I might miss ?
Created attachment 15629 [details] Batch script to compile and run in Windows.
pajowett2->SW: Thanks again. I asked another developer to try it for me (WinXP OpenOffice1.1.0, JDK 1.3) and he confirmed that he sees the same problem. What operating system did you run your tests on? What did you use to view the resulting .doc files? I should probably explain that the test case I sent you RANDOMLY removes sections from the template.doc file so that the TOC is expected to be different in different runs. The fastest way to see if it is working is to open each of the resulting documents (3 files in the conv directory), find the TOC and update it (right click->update). If the numbers change when you do this, the problem is revealed! I have attached another zip file which contains the exact run case. If you extract it to c:\temp on a windows platform, update the variables at the top of the run.bat script (to match your hostname, JDK and Office) the execute run.bat, it will compile and run the test case. The 3 docs produced in conv almost always exhibit the problem. Please run it a few times to confirm since the first time I ran it this morning, all 3 documents were fine, but every run since, at least one has an incorrect TOC.
pajowett2->SW: I just noticed that if I open one of the documents with the bad TOC into StarWriter, the TOC matches the document! This means that the pagination for the exported document is different - and sure enough, in StarWriter, the doc displays with 1 more page than in MSWord. I'm still looking as to why but it looks like a combination of page height and word wrapping. This can't be acceptable since exporting to PDF shows exactly the same problem and the TOC can never be corrected.
pajowett2->SW: Problem is down to the fact that MSWord and StarOffice view the document differently. Please ignore my PDF export comment above, since the PDF export always matches the StarOffice view (which matches the TOC). Word is breaking the text differently to StarOffice and resulting in more or fewer lines and hence different page counts. Are you aware of any settings to control the way words are broken?
SW->pajowett2: that the views look different and hence the page-count might differ results from the fact that OOo1.x and StarOffice 7 use Printer-metrics of the used printer to render the text and Word uses a virtual device. OOo2.0 will use a virtual device too and then the results should be more equal :-) ... will your script discover additional different behaviour or is the different page-count already what you noticed as bug ?
pajowett2->SW: Thanks SW. I'm not sure what your question means:" will your script discover additional different behaviour or is the different page-count already what you noticed as bug ?". I didn't think Virtual Device vs Print Metrics would affect where words are broken. I will attach a 1 page MSWord document which shows the problem.
Created attachment 15654 [details] Page displays differently in MSWord vs Writer
SW->MRU: the last added attachment show a difference in hyphenation between the way the document looks in word and the way it look in OOo-writer ... looks like a word-import issue, please have a look at this.
Not an import issue (and also not hyphenation)... The break iterator just is a bit defferent then this from Word (best be seen in the table on page 4). MRU->SBA: Will there anything be changed?
SBA: I had a look with OOo 1.9.70. The fact that it is DIFFERENT from MS Word does not mean that OOo does wrong. Example (from Page 4, First table row): ".....DETAILS.TITLE-] [-PERSONAL_......" - MS Word breaks between [-" and "PERSONAL" (at the end of the line) - OOo breaks between "]" and "[" because there is a space. IMHO a line break at this point makes much more sense. Being different SOMETIMES means being better IMHO. Anyway, to clone the break iterator behavior in order to behave the same is far too costly compared to the problem of "not blindly photo-copying". Set to "Wontfix".
SBA: Closed. Please don't reopen unless you find an OOo developer willing to deal with this.