Apache OpenOffice (AOO) Bugzilla – Issue 41809
OOo 1.1 Document Won't Open In M74
Last modified: 2013-08-07 14:41:36 UTC
Submitted by one of our beta testers, this document opens fine in OOo 1.1.x but when you open it in M74 the document is empty and the title bar doesn't indicate the file is open. Attaching document and shot of document open in 1.1.2
Created attachment 22088 [details] Document open in 1.1.2
Created attachment 22089 [details] Document, will not open in M74
reassigned to ES.
ES->DVO: as discussed, please add a comment on this. Thanx!
.
Several issues: 1) The doc was obviously generated by external tools, and violates the spec in several (somewhat subtle) points. In particular, the manifest is missing, and the mimetype stream is not at the beginning of the file. 2) We *used* to read such files just fine, which *I* thought was a good feature. My understanding is that in recent versions the type detection became stricter (mba said as much in xml-dev@ooo), and does not recognize such files any longer. I'm not sure if and to what extend that is considered a bug or feature, although my preference is quite clearly on the former. 1) + 2) explains why the file no longer loads. 3) There is no user visible error message. The file simply doesn't load, and nothing else happens. I guess this would be a usability problem, at least. dvo->mba: As said above, I don't know how much of the above is bug or feature. Please decide, and forward/handle as appropriate. Thanks. dvo->drichard: "our beta testers"???
Comments- Our Beta Testers: The City of Largo has put about 30 people live on 680 starting a few milestones ago. We are using it fulltime. We converted hundreds of thousands (really!) documents from WordPerfect to Openoffice format 1.5 years ago using libwpd. It's possible that utility didn't write out a document perfectly, but most so far seem to work in 680. We would be in a bad way if 2.0 doesn't open these documents; I have found others that will not open either. I know other organizations have converted thousands of documents as well using libwpd.
dvo->drichard: Well, there's always a way out in that one could write a converter for this particular type of document. Which shouldn't be particularly hard in this case. Still, I'll wait for mba's comment before drawing any conclusions.
dvo: Uh, also, the mimetype stream is compressed, at -7% compression ratio. That's odd.
The problem is that those files *are* broken, so it would be a bug to open them without an additional user action. m74 has a bug that it doesn't open documents that are not detected, m76 will show a filter dialog where you can force OOo to load the document. This might appear inconvenient but IMHO it's the appropriate way to treat the documents. Just saving them once after loading fixes the problem. It was necessary to make this move because it is the only way to detect documents reliably. Otherwise loading documents without or with a "wrong" extension can't work. If we accept documents like the attached one any zip file would be accepted as a valid OOo document. I wasn't aware that a broken tool is used outside to create OOo documents and I'm still not convinced that we should lower the quality of our type detection. So I set it to "WontFix". Of course that doesn't mean I can't become convinced to change something if it doesn't make our type detection less reliable. :-) m76 should load those documents, but not without a filter dialog.
Ah sorry, I just overlooked that the file contains a "Mime magic" stream. Even if it is compressed we can use it for detection, but we don't do it currently because OOo1.0 never wrote any of those streams. But we can implement a fallback that in case there is no manifest.xml we use the "mimetype" stream. Sorry for possible confusion.
Mikhail, as discussed please change the Package Component so that i takes the mimetype stream as a fallback for the "MediaType" property in case there is no manifest.xml. If there is a manifest.xml we should use the MediaType from there always, even if it is empty or wrong.
dvo: Thanks. I like this a lot better; particularly with the mimetype fallback. Just for curiosity: Why the precedence of manifest over mimetype?
I can't thank you enough for allowing these slightly non-standard files to load into OOo. Giving them a dialog and having them save the file again to correct the issue is perfect. That will allow us to slowly correct these old files as people open them. This was a onetime issue related to the WP-->OOo conversion.
While you are working on type detection, please have a look at issue 39255 (OOo crashes when manifest.xml starts with BOM (byte order mark)
drichard: I just modified the wpd2sxw. Could you test whether this document opens correctly in M74?
Created attachment 22116 [details] Test document converted with modified wpd2sxw
dvo->fridrich_strba: Loads fine for me, but it's still not fully spec conforming: The mimetype stream is not uncompressed. Explanation: Three 'special' properties apply to the mimetype stream: 1) it must be first, 2) it must be uncompressed, 3) it must not use 'extra data'. The reason for these being that during the standardization process, several parties wanted better integration of the format into their (existing) infrastructure. For example, both KDE and Gnome use file type detection based on the Unix 'magic' tool, which recognizes magic number at fixed positions in the file. The ZIP format itself doesn't guarantee this, which is why we established those extra rules. If the above conditions are met, you will see the file name ("mimetype") at position 30 in the file, and the actual mime type ("application/vnd.sun.xml.writer") just after. If you look at the file in an editor, you will see both as a string ("mimetypeapplication/vnd.sun.xml.writer") at the beginning of the file. Which doesn't work if the mimetype file is compressed. (Then you will see "mimetype" followed by binary stuff.) Fridrich, I'd be rather thankful if you could tune the wpd2sxw tool accordingly. Thanks.
> Just for curiosity: Why the precedence of manifest over mimetype? From my point of view "mimetype" substream is just an optional extension that duplicates information stored in "manifest.xml" for the purposes you have mentioned already, "manifest.xml" is the main source of document type information in the package format. Using of "mimetype" stream as a source to get package mediatype information looks for me to be close to using of document URL extension for the same reason. It seems to be acceptable only as a fallback solution. And in case of conflict with the value stored in "manifest.xml" the latest one should be used.
Confirmed that document from fridrich_strba opened in M74 just fine. It was my understanding that once libwpd was integrated into OOo that the command line utility was going away -- or I would have reported it. wpd2sxw is wonderful for organizations that want to migrate completely and remove all WP documents and worked well for us.
fridrich_strba->dvo: wpd2sxw uses for writing out the sxw document libgsf. I did not figure out for the while how to make libgsf change the compression ratio between two children files. I explored a bit today and discovered that libgsf is actually preventing such behaviour. The function that changes the compression exits if the zip file is in state "writing=true". Will explore more workarounds, but that is it for the while.
fridrich_strba->drichard: No, wpd2sxw is not dead :-). It is useful tool for migrating documents in archives without having the users import them one by one into OOo.
There is a slight problem with mimetype stream based workaround. If there are substorages ( representing either an own embedded object of our own format or a possible extension with unknown mediatype ) in the document with no manifest.xml this document can not be loaded without information loss. There is no way to repare a possible document extension, it is even not possible to identify whether the substorage is own object or an extension ( that can look like an own embedded object ). So the following approach is chosen for now: if there is no "manifest.xml" available, the mimetype stream is available and the document has no substorages ( except known ones, like Configuration, Basic and etc. ) a warning about document corruption is shown and repairing feature is used; if the document has substorages the office will reject opening of the document. In other words the document without manifest.xml can be opened only if it contains no embedded objects and document extensions.
fridrich_strba->mav: The docuemnts created by wpd2sxw <= 0.6.1 contained only two streams "mimetype" and "content.xml". It contains no extensions or whatever. For the wpd2sxw-0.7,x that will be released in a very close future, following modifications were done: 1) First stream to be written is the "mimetype" stream which is unfortunatelly compressed with a ration of "-7%" due to current limitations of libgsf used by wpd2sxw. 2) Second stream to be written is the "META-INF/manifest.xml" that contains following string: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\ <!DOCTYPE manifest:manifest PUBLIC \"-//OpenOffice.org//DTD Manifest 1.0//EN\" \"Manifest.dtd\">\n\ <manifest:manifest xmlns:manifest=\"http://openoffice.org/2001/manifest\">\n\ <manifest:file-entry manifest:media-type=\"application/vnd.sun.xml.writer\" manifest:full-path=\"/\"/>\n\ <manifest:file-entry manifest:media-type=\"text/xml\" manifest:full-path=\"content.xml\"/>\n\ </manifest:manifest>\n" 3) The third and last stream to be written is the content.xml that is a flat xml result of the conversion of a WordPerfect file. Due to the limitations of libgsf (bug filed: http://bugzilla.gnome.org/show_bug.cgi?id=166139), like this the document is not completely conform to the specs, but it is opened by m74 without any additional question.
mav->fridrich_strba: This means that a document produced by wpd2sxw <= 0.6.1 will be opened with a notification that the document is corrupted and the user will be asked whether it should be recovered. And a document produced by wpd2sxw-0.7,x still will be opened without any question, although the integration in some third party applications will not work since "mimetype" stream is in wrong format. I am not sure that it makes sence to treat a document with a wrong "mimetype" stream as a corrupted, at least in OOo1.0 file format. I could not find a strict specification for "mimetype" stream for OOo1.0 format till now. OASIS format is a different story, but even there the necessity of the check for the correctness of this stream is discussible. Actually I would reccomend to get rid of "mediatype" stream at all in the new version of wpd2sxw if the stream can not be stored according to the specification. The .sxw document without this stream is a valid document, the validity of an .sxw document with this stream in wrong format is at least questionable, besides that "mediatype" stream has no value when it is stored in wrong way.
Ups, sorry, in the comments above please treat "mediatype" stream as "mimetype" stream.
Tested this document in M77, and as expected the dialog opened and asked for information about what kind of document it was, and OpenOffice 1.0 format was at the top of the list. This would work fine for us. However, when I selected that option and clicked OK, panel started to open and then it halts and displays an error message. Attaching shot of error message.
Created attachment 22298 [details] Error that comes up when you attempt to open this kind of document in M77
Fixed in mav16 cws.
MAV->ES: Please verify the issue. re-open issue and try to reassign to es@openoffice.org
try to reassign to es@openoffice.org
try to reset resolution to FIXED
Found fixed on cws mav16 using Solaris, Linux and Windows build
*** Issue 43330 has been marked as a duplicate of this issue. ***
Ok in src680m84