Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Enhance XMerge to allow access to embedded objects in OpenOffice XML files. | ||
---|---|---|---|
Product: | xml | Reporter: | Unknown <non-migrated> |
Component: | smalldevices | Assignee: | Unknown <non-migrated> |
Status: | CLOSED FIXED | QA Contact: | issues@xml <issues> |
Severity: | Trivial | ||
Priority: | P3 | CC: | issues |
Version: | current | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- |
Developer Difficulty: | --- |
Description
Unknown
2002-10-29 16:10:44 UTC
Changes are mostly complete. Will use this bug to track changes made to the XMerge API. EmbeddedObject defines accessor methods for the data of the embedded object as well as the name/path (within the manifest.xml file) and MIME type of the object. A number of package private methods also exist to interact with the OfficeZip and OfficeDocument classes for storage purposes. Note that flat OpenOffice.org XML files store embedded objects as inline tags/data within the document structure. The EmbeddedObject class and its subclasses are intended to represent embedded objects as stored in the zipped OpenOffice.org file format. Retrieval of both EmbeddedObject information and the data for each EmbeddedObject is deferred until specifically called via provided methods. This incurs a performance penalty when first accessing data, but ensures that no performance degradation occurs where embedded object data is not a concern. In order to support the retrival of data, two new public methods have been added to OfficeDocument. The first returns an Iterator of all the embedded objects in the document. The second returns a specific EmbeddedObject instance representing a named object. An object name can be found from the xlink:href attribute for an embedded object in a document's content tree. Tested read and write functionality. Can successfully read and write embedded objects when converting. Tests on existing plugins show no impact on existing XMerge functionality. All changes now committed. There is a small issue: The code to disable processing the DTD doesn't work with Crimson as a parser. Here is a simple fix: In the method "getNamedDOM" in EmbeddedXMLDocument, return builder.parse(domData); can be replaced with InputSource is = new InputSource(domData); is.setSystemId(""); return builder.parse(is); Also, OfficeDocument uses another trick to avoid reading the DTD (the method "hack"). This code doesn't work with non-ASCII characters (it doesn't translate from utf-8); to fix that, it should be replaced by the same code as in EmbeddedXMLDocument. Another detail: There is some confusion with trailing "/" for embedded objects: In manifest.xml an XML object is named with a trailing "/" (because it is a directory in the zip file). A binary object does not have a trailing "/" (since it is a file in the zip file). The method getEmbeddedObject(String name) in OfficeDocument uses the name from manifest.xml. But in the xlink:href attributes as well as in EmbeddedObject objects, there is never a trailing "/" in the name. So I think the most consistent solution would be not to require the trailing "/" in getEmbeddedObject. The trailing '/' character should not be required for getEmbeddedObject. When the objects are being read in, any trailing character is chopped off. See getEmbeddedObjects(). The documentation for getEmbeddedObject also states that any '/' or '#' characters should be stripped. These are the extras that appear in the xlink:href entry. Fixed the problem with the trailing '/' character. Also amended the hack() method of OfficeDocument to read the byte stream as UTF-8. This resolves the issue of searching for a DTD. The previous approach, to use an EntityResolver, did not work consistently on all parsers. Henrik's development and testing indicates that the changes work as they should. Internal testing shows no regressions. Henrik's e-mail: Hi Mark I've tested the latest version of OfficeDocument and EmbeddedXMLDocument. Everything seems to be perfect! - I have no trouble extracting formulas and graphics from a Writer document. Thanks again! Henrik Closing this bug. |