Apache OpenOffice (AOO) Bugzilla – Issue 1997
Input Filters for M$Word documents don't work well with templates
Last modified: 2003-09-08 16:56:16 UTC
My office uses M$Word for documents, and I am attempting to run only Linux on my machine. I'm having problems importing and exporting Cisco template documents from Word into and out of OpenOffice. I'm running Build 638c on Linux Red Hat 7.1 I have a sample document which I sanitized (removed names, Cisco-proprietary comments etc.) It still has a Cisco logo in the header. The M$Word file is ~175k bytes. If I read it in with Open Office, the formatting is off - the Cisco logo is pushed up a bit towards the top of the page. Other more minor formatting problems exist. I've had problems (in other documents) with table of contents getting messed up, along with section numbering. If I read this document in and then do a "save-as", still in M$Word format, the resulting document is over 800k bytes. If I read _that_ document in, the Cisco logo is pushed up even further - making it almost invisible in the header. I read Brian Proffit's article today where he talked about the size increase (I'm the Neil Cohen referred to in the article). I understand why that might happen, but I'm still a bit concerned about an 8-fold increase in size when no editing takes place... I can live with that if I have to, but I really do need to be able to read and write documents without changing the formatting, so I can exchange them with M$Word users. This is the first time I've used this report form - so I don't know where to put the sample documents. If someone can get in touch with me, I'll be happy to email them to you. Summary - there are 2 problems: 1) Document formatting - images in the header, table of contents/section numbering (my sample document does not exhibit that problem - I might be able to sanitize another example if necessary). 2) Document grows by a factor of 7-8 without any editing being done Please have someone contact me to get my example documents thanks, nbc
Created attachment 613 [details] Generic Cisco Template document - M$Word format, gzipped
Created attachment 614 [details] Generic Cisco Template document - M$Word format, gzipped - after save by OpenOffice
2 things - I was able to add my 2 documents to this issue. Let me know if you have problems with them or questions about them. The size increase is actually more like 4-5, not 7-8 - sorry about that... nbc
Reassigned to Michael.
MRU->CMC: Two things I could see. First is, that we only import the original size of the OLE-Object in the header. We should downsize it to the percentage given in the "Size"-Attribute of it. Second: the increased filesize after exporting to WW-format. I´ll have a closer look to it by time and then I´ll write an internal Bug for it, if necessary.
The image positioning problem has a fix checked in. So that will be fixed in the next release. On the other topic, I can shave at almost 200k of the document by checking for some image/object duplication. Note that if you reedit the header objects in word 2000 the document gets larger in it as well. The duplicate images and objects and conservative saving schemere are where the bloat in the document comes from. Its worth investigating further. so last OOo ver 824.320 personal hacked OOO ver 635.904 reedited in word 2000 338.432 original word 179.200
Lots of changes checked in. My current size is now 252,928 which is 73k bigger than the original and 600k better than our last version. The extra size is an extra stream stored by a seperate component from the filter which is of size 93k so thats why its still a little bigger. I don't know yet if it is safe to discard this extra data , if we could the new file would be smaller than the original. Will have to wait and see what the situation with the extra stream is (http://oi.openoffice.org/)
As small as the filter can do it in 650. Possibly can be made smaller depending on resolution of the extra stream issue 2179
Will be fixed for OpenOffice 643 release.
Fixed in OpenOffice 643.