Issue 1997 - Input Filters for M$Word documents don't work well with templates
Summary: Input Filters for M$Word documents don't work well with templates
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: 638
Hardware: PC Linux, all
: P2 Trivial (vote)
Target Milestone: ---
Assignee: michael.ruess
QA Contact: issues@sw
URL:
Keywords:
Depends on: 2179
Blocks:
  Show dependency tree
 
Reported: 2001-10-23 16:18 UTC by nbc
Modified: 2003-09-08 16:56 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Generic Cisco Template document - M$Word format, gzipped (44.99 KB, application/octet-stream)
2001-10-23 16:30 UTC, nbc
no flags Details
Generic Cisco Template document - M$Word format, gzipped - after save by OpenOffice (107.73 KB, application/octet-stream)
2001-10-23 16:32 UTC, nbc
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description nbc 2001-10-23 16:18:03 UTC
My office uses M$Word for documents, and I am attempting to run only Linux on
my machine. I'm having problems importing and exporting Cisco template documents
from Word into and out of OpenOffice. I'm running Build 638c on Linux Red
Hat 7.1

I have a sample document which I sanitized (removed names, Cisco-proprietary
comments etc.) It still has a Cisco logo in the header. The M$Word file is
~175k bytes. If I read it in with Open Office, the formatting is off - the
Cisco logo is pushed up a bit towards the top of the page. Other more minor
formatting problems exist. I've had problems (in other documents) with table of
contents getting messed up, along with section numbering.

If I read this document in and then do a "save-as", still in M$Word format, the
resulting document is over 800k bytes. If I read _that_ document in, the Cisco
logo is pushed up even further - making it almost invisible in the header.

I read Brian Proffit's article today where he talked about the size increase
(I'm the Neil Cohen referred to in the article). I understand why that might
happen, but I'm still a bit concerned about an 8-fold increase in size when no
editing takes place... I can live with that if I have to, but I really do need
to be able to read and write documents without changing the formatting, so I can
exchange them with M$Word users.

This is the first time I've used this report form - so I don't know where to put
the sample documents. If someone can get in touch with me, I'll be happy to
email them to you.

Summary - there are 2 problems:

1) Document formatting - images in the header, table of contents/section
numbering (my sample document does not exhibit that problem - I might be able to
sanitize another example if necessary).

2) Document grows by a factor of 7-8 without any editing being done

Please have someone contact me to get my example documents

thanks,

nbc
Comment 1 nbc 2001-10-23 16:30:57 UTC
Created attachment 613 [details]
Generic Cisco Template document - M$Word format, gzipped
Comment 2 nbc 2001-10-23 16:32:16 UTC
Created attachment 614 [details]
Generic Cisco Template document - M$Word format, gzipped - after save by OpenOffice
Comment 3 nbc 2001-10-23 16:35:16 UTC
2 things - I was able to add my 2 documents to this issue. Let me know
if you have problems with them or questions about them.

The size increase is actually more like 4-5, not 7-8 - sorry about that...

nbc
Comment 4 stefan.baltzer 2001-10-23 17:24:01 UTC
Reassigned to Michael.
Comment 5 michael.ruess 2001-10-24 15:19:49 UTC
MRU->CMC: Two things I could see. First is, that we only import the 
original size of the OLE-Object in the header. We should downsize it 
to the percentage given in the "Size"-Attribute of it.
Second: the increased filesize after exporting to WW-format. I´ll 
have a closer look to it by time and then I´ll write an internal Bug 
for it, if necessary.
Comment 6 caolanm 2001-10-26 14:19:26 UTC
The image positioning problem has a fix checked in. So that will be
fixed in the next release.

On the other topic, I can shave at almost 200k of the document by
checking for some image/object duplication. Note that if you reedit
the header objects in word 2000 the document gets larger in it as
well. The duplicate images and objects and conservative saving
schemere are where the bloat in the document comes from. Its worth
investigating further.

so

last OOo ver 824.320
personal hacked OOO ver 635.904
reedited in word 2000 338.432
original word 179.200
Comment 7 caolanm 2001-10-31 13:01:37 UTC
Lots of changes checked in. My current size is now 252,928 which is
73k bigger than the original and 600k better than our last version. 

The extra size is an extra stream stored by a seperate component from
the filter which is of size 93k so thats why its still a little
bigger. I don't know yet if it is safe to discard this extra data
, if we could the new file would be smaller than the original. Will
have to wait and see what the situation with the extra stream is
(http://oi.openoffice.org/)
Comment 8 caolanm 2001-11-12 13:37:03 UTC
As small as the filter can do it in 650. Possibly can be made smaller
depending on resolution of the extra stream issue 2179
Comment 9 michael.ruess 2002-09-05 11:22:44 UTC
Will be fixed for OpenOffice 643 release.
Comment 10 michael.ruess 2002-11-07 11:15:02 UTC
Fixed in OpenOffice 643.