Issue 41809 - OOo 1.1 Document Won't Open In M74
Summary: OOo 1.1 Document Won't Open In M74
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: 680m74
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: eric.savary
QA Contact: issues@sw
URL:
Keywords: oooqa
: 43330 (view as issue list)
Depends on:
Blocks:
 
Reported: 2005-02-01 13:58 UTC by drichard
Modified: 2013-08-07 14:41 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Document open in 1.1.2 (201.01 KB, image/png)
2005-02-01 13:59 UTC, drichard
no flags Details
Document, will not open in M74 (8.57 KB, application/vnd.sun.xml.writer)
2005-02-01 14:01 UTC, drichard
no flags Details
Test document converted with modified wpd2sxw (6.38 KB, application/vnd.sun.xml.writer)
2005-02-02 07:46 UTC, fridrich.strba
no flags Details
Error that comes up when you attempt to open this kind of document in M77 (5.85 KB, image/gif)
2005-02-07 14:48 UTC, drichard
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description drichard 2005-02-01 13:58:58 UTC
Submitted by one of our beta testers, this document opens fine in OOo 1.1.x but
when you open it in M74 the document is empty and the title bar doesn't indicate
the file is open.

Attaching document and shot of document open in 1.1.2
Comment 1 drichard 2005-02-01 13:59:43 UTC
Created attachment 22088 [details]
Document open in 1.1.2
Comment 2 drichard 2005-02-01 14:01:15 UTC
Created attachment 22089 [details]
Document, will not open in M74
Comment 3 michael.ruess 2005-02-01 14:34:52 UTC
reassigned to ES.
Comment 4 eric.savary 2005-02-01 14:49:14 UTC
ES->DVO: as discussed, please add a comment on this. Thanx!
Comment 5 eric.savary 2005-02-01 14:50:06 UTC
.
Comment 6 openoffice 2005-02-01 15:00:54 UTC
Several issues: 

1) The doc was obviously generated by external tools, and violates the spec in
several (somewhat subtle) points. In particular, the manifest is missing, and
the mimetype stream is not at the beginning of the file.

2) We *used* to read such files just fine, which *I* thought was a good feature.
My understanding is that in recent versions the type detection became stricter
(mba said as much in xml-dev@ooo), and does not recognize such files any longer.
I'm not sure if and to what extend that is considered a bug or feature, although
my preference is quite clearly on the former.

1) + 2) explains why the file no longer loads.

3) There is no user visible error message. The file simply doesn't load, and
nothing else happens. I guess this would be a usability problem, at least.



dvo->mba: As said above, I don't know how much of the above is bug or feature.
Please decide, and forward/handle as appropriate. Thanks.

dvo->drichard: "our beta testers"???
Comment 7 drichard 2005-02-01 15:13:27 UTC
Comments-

   Our Beta Testers: The City of Largo has put about 30 people live on 680
starting a few milestones ago.   We are using it fulltime.

   We converted hundreds of thousands (really!)  documents from WordPerfect to
Openoffice format 1.5 years ago using libwpd.  It's possible that utility didn't
write out a document perfectly, but most so far seem to work in 680.  We would
be in a bad way if 2.0 doesn't open these documents; I have found others that
will not open either.  I know other organizations have converted thousands of
documents as well using libwpd.   
Comment 8 openoffice 2005-02-01 15:54:08 UTC
dvo->drichard: Well, there's always a way out in that one could write a
converter for this particular type of document. Which shouldn't be particularly
hard in this case. Still, I'll wait for mba's comment before drawing any
conclusions.
Comment 9 openoffice 2005-02-01 15:56:29 UTC
dvo: Uh, also, the mimetype stream is compressed, at -7% compression ratio.
That's odd.
Comment 10 Mathias_Bauer 2005-02-01 16:04:07 UTC
The problem is that those files *are* broken, so it would be a bug to open them
without an additional user action. m74 has a bug that it doesn't open documents
that are not detected, m76 will show a filter dialog where you can force OOo to
load the document. This might appear inconvenient but IMHO it's the appropriate
way to treat the documents. Just saving them once after loading fixes the problem.

It was necessary to make this move because it is the only way to detect
documents reliably. Otherwise loading documents without or with a "wrong"
extension can't work.

If we accept documents like the attached one any zip file would be accepted as a
valid OOo document. 

I wasn't aware that a broken tool is used outside to create OOo documents and
I'm still not convinced that we should lower the quality of our type detection.
So I set it to "WontFix". Of course that doesn't mean I can't become convinced
to change something if it doesn't make our type detection less reliable. :-)

m76 should load those documents, but not without a filter dialog.
Comment 11 Mathias_Bauer 2005-02-01 16:14:41 UTC
Ah sorry, I just overlooked that the file contains a "Mime magic" stream. Even
if it is compressed we can use it for detection, but we don't do it currently
because OOo1.0 never wrote any of those streams. But we can implement a fallback
that in case there is no manifest.xml we use the "mimetype" stream.

Sorry for possible confusion.
Comment 12 Mathias_Bauer 2005-02-01 16:16:16 UTC
Mikhail, as discussed please change the Package Component so that i takes the
mimetype stream as a fallback for the "MediaType" property in case there is no
manifest.xml. If there is a manifest.xml we should use the MediaType from there
always, even if it is empty or wrong.
Comment 13 openoffice 2005-02-01 17:04:17 UTC
dvo: Thanks. I like this a lot better; particularly with the mimetype fallback.

Just for curiosity: Why the precedence of manifest over mimetype?
Comment 14 drichard 2005-02-01 17:29:30 UTC
I can't thank you enough for allowing these slightly non-standard files to load
into OOo.  Giving them a dialog and having them save the file again to correct
the issue is perfect.  That will allow us to slowly correct these old files as
people open them.  This was a onetime issue related to the WP-->OOo conversion.
Comment 15 lohmaier 2005-02-01 18:29:25 UTC
While you are working on type detection, please have a look at issue 39255 (OOo
crashes when manifest.xml starts with BOM (byte order mark)
Comment 16 fridrich.strba 2005-02-02 07:44:48 UTC
drichard: I just modified the wpd2sxw. Could you test whether this document
opens correctly in M74?
Comment 17 fridrich.strba 2005-02-02 07:46:01 UTC
Created attachment 22116 [details]
Test document converted with modified wpd2sxw
Comment 18 openoffice 2005-02-02 10:54:39 UTC
dvo->fridrich_strba: Loads fine for me, but it's still not fully spec
conforming: The mimetype stream is not uncompressed.

Explanation: Three 'special' properties apply to the mimetype stream: 1) it must
be first, 2) it must be uncompressed, 3) it must not use 'extra data'. The
reason for these being that during the standardization process, several parties
wanted better integration of the format into their (existing) infrastructure.
For example, both KDE and Gnome use file type detection based on the Unix
'magic' tool, which recognizes magic number at fixed positions in the file. The
ZIP format itself doesn't guarantee this, which is why we established those
extra rules. If the above conditions are met, you will see the file name
("mimetype") at position 30 in the file, and the actual mime type
("application/vnd.sun.xml.writer") just after. If you look at the file in an
editor, you will see both as a string ("mimetypeapplication/vnd.sun.xml.writer")
at the beginning of the file. Which doesn't work if the mimetype file is
compressed. (Then you will see "mimetype" followed by binary stuff.) Fridrich,
I'd be rather thankful if you could tune the wpd2sxw tool accordingly. Thanks.


Comment 19 mikhail.voytenko 2005-02-02 11:59:58 UTC
> Just for curiosity: Why the precedence of manifest over mimetype?
From my point of view "mimetype" substream is just an optional extension that
duplicates information stored in "manifest.xml" for the purposes you have
mentioned already, "manifest.xml" is the main source of document type
information in the package format.
Using of "mimetype" stream as a source to get package mediatype information
looks for me to be close to using of document URL extension for the same reason.
It seems to be acceptable only as a fallback solution. And in case of conflict
with the value stored in "manifest.xml" the latest one should be used.
Comment 20 drichard 2005-02-02 14:05:42 UTC
Confirmed that document from fridrich_strba opened in M74 just fine.

It was my understanding that once libwpd was integrated into OOo that the
command line utility was going away -- or I would have reported it.

wpd2sxw is wonderful for organizations that want to migrate completely and
remove all WP documents and worked well for us.
Comment 21 fridrich.strba 2005-02-02 22:21:18 UTC
fridrich_strba->dvo: wpd2sxw uses for writing out the sxw document libgsf. I did 
not figure out for the while how to make libgsf change the compression ratio 
between two children files. I explored a bit today and discovered that libgsf is 
actually preventing such behaviour. The function that changes the compression 
exits if the zip file is in state "writing=true". Will explore more workarounds, 
but that is it for the while.
Comment 22 fridrich.strba 2005-02-02 22:22:46 UTC
fridrich_strba->drichard: No, wpd2sxw is not dead :-). It is useful tool for 
migrating documents in archives without having the users import them one by one 
into OOo.
Comment 23 mikhail.voytenko 2005-02-07 09:01:24 UTC
There is a slight problem with mimetype stream based workaround. If there are
substorages ( representing either an own embedded object of our own format or a
possible extension with unknown mediatype ) in the document with no manifest.xml
this document can not be loaded without information loss. There is no way to
repare a possible document extension, it is even not possible to identify
whether the substorage is own object or an extension ( that can look like an own
embedded object ).

So the following approach is chosen for now:
if there is no "manifest.xml" available, the mimetype stream is available and
the document has no substorages (  except known ones, like Configuration, Basic
and etc. ) a warning about document corruption is shown and repairing feature is
used; if the document has substorages the office will reject opening of the
document. In other words the document without manifest.xml can be opened only if
it contains no embedded objects and document extensions.
Comment 24 fridrich.strba 2005-02-07 09:46:14 UTC
fridrich_strba->mav: The docuemnts created by wpd2sxw <= 0.6.1 contained only
two streams "mimetype" and "content.xml". It contains no extensions or whatever.

For the wpd2sxw-0.7,x that will be released in a very close future, following
modifications were done:
1) First stream to be written is the "mimetype" stream which is unfortunatelly
compressed with a ration of "-7%" due to current limitations of libgsf used by
wpd2sxw.
2) Second stream to be written is the "META-INF/manifest.xml" that contains
following string:
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\
<!DOCTYPE manifest:manifest PUBLIC \"-//OpenOffice.org//DTD Manifest 1.0//EN\"
\"Manifest.dtd\">\n\
<manifest:manifest xmlns:manifest=\"http://openoffice.org/2001/manifest\">\n\
 <manifest:file-entry manifest:media-type=\"application/vnd.sun.xml.writer\"
manifest:full-path=\"/\"/>\n\
 <manifest:file-entry manifest:media-type=\"text/xml\"
manifest:full-path=\"content.xml\"/>\n\
</manifest:manifest>\n"
3) The third and last stream to be written is the content.xml that is a flat xml
result of the conversion of a WordPerfect file.

Due to the limitations of libgsf (bug filed:
http://bugzilla.gnome.org/show_bug.cgi?id=166139), like this the document is not
completely conform to the specs, but it is opened by m74 without any additional
question.
Comment 25 mikhail.voytenko 2005-02-07 10:51:00 UTC
mav->fridrich_strba: This means that a document produced by wpd2sxw <= 0.6.1
will be opened with a notification that the document is corrupted and the user
will be asked whether it should be recovered.

And a document produced by wpd2sxw-0.7,x still will be opened without any
question, although the integration in some third party applications will not
work since "mimetype" stream is in wrong format. I am not sure that it makes
sence to treat a document with a wrong "mimetype" stream as a corrupted, at
least in OOo1.0 file format. I could not find a strict specification for
"mimetype" stream for OOo1.0 format till now. OASIS format is a different story,
but even there the necessity of the check for the correctness of this stream is
discussible.

Actually I would reccomend to get rid of "mediatype" stream at all in the new
version of wpd2sxw if the stream can not be stored according to the
specification. The .sxw document without this stream is a valid document, the
validity of an .sxw document with this stream in wrong format is at least
questionable, besides that "mediatype" stream has no value when it is stored in
wrong way.
Comment 26 mikhail.voytenko 2005-02-07 11:01:48 UTC
Ups, sorry, in the comments above please treat "mediatype" stream as "mimetype"
stream.
Comment 27 drichard 2005-02-07 14:47:36 UTC
Tested this document in M77, and as expected the dialog opened and asked for
information about what kind of document it was, and OpenOffice 1.0 format was at
the top of the list.  This would work fine for us.  However, when I selected
that option and clicked OK, panel started to open and then it halts and displays
an error message. Attaching shot of error message.  
Comment 28 drichard 2005-02-07 14:48:36 UTC
Created attachment 22298 [details]
Error that comes up when you attempt to open this kind of document in M77
Comment 29 mikhail.voytenko 2005-02-10 14:13:22 UTC
Fixed in mav16 cws.
Comment 30 mikhail.voytenko 2005-02-16 09:23:18 UTC
MAV->ES: Please verify the issue.
re-open issue and try to reassign to es@openoffice.org
Comment 31 mikhail.voytenko 2005-02-16 09:23:25 UTC
try to reassign to es@openoffice.org
Comment 32 mikhail.voytenko 2005-02-16 09:23:37 UTC
try to reset resolution to FIXED
Comment 33 frank 2005-02-18 13:54:23 UTC
Found fixed on cws mav16 using Solaris, Linux and Windows build
Comment 34 thorsten.ziehm 2005-03-04 13:33:31 UTC
*** Issue 43330 has been marked as a duplicate of this issue. ***
Comment 35 thorsten.ziehm 2005-03-04 13:36:11 UTC
*** Issue 43330 has been marked as a duplicate of this issue. ***
Comment 36 eric.savary 2005-03-05 14:57:06 UTC
Ok in src680m84