Apache OpenOffice (AOO) Bugzilla – Issue 46016
Unopkg cannot parse an XML manifest with leading whitespace
Last modified: 2017-05-20 11:33:46 UTC
Discovered that a manifest.xml file included in a component won't be parsed by unopkg if it has a leading line of whitespace before the XML declaration. Unopkg will install the bundle but will not deploy anything listed in the manifest.xml file. The work around is to remove the whitespace but it will catch people off guard who think there component must have done somethng wrong.
.
jsc -> dbo: it's yours
As I am using the com.sun.star.packages.manifest.ManifestReader, I think it is a problem in that area. @MAV: you are the best one I know for this, please take over. Martin Gallwey does not contribute/fix anything anymore, right?
MAV->DVO: ManifestReader implementation uses Sax parser to parse the stream. So it looks like the parser is confused by the line of whitespaces. Could you please take a look.
Hi. I just noticed this issue accidentally. If an XML document contains an XML declaration, that declaration MUST be at the very beginning of the document. In particular, whitespace preceding the xml declaration is not allowed in a well-formed XML document. The only exception is, that a single unicode byte order mark is permitted. I suggest closing this issue as INVALID.
I looked through the specification (current, third edition) and it discusses the difference validating vs. nonvalidating processors need to apply when parsing documents and it discusses how to treat whitespace but I don't see anywhere in the specification where it calls out that this is a problem. I might have missed it but my sense is that it should be handled in some way. I know other parsers I've used don't like it and will throw an error but I don't think it should silently fail.
It's in chapter 2.1 in combination with chapter 2.8 of the XML specification (http://www.w3.org/TR/REC-xml/). Chapter 2 defines the physical layout of document. You will find the following productions in the respective chapters: [1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' Note that allowed whitespace is explicit in those productions (the S rule), and there is none allowed in front of the '<?xml' text. Hence, the parser is correct. dvo->raymondb: I'm tempted to close as INVALID as well. Or, if the problem is actually about the silent failure, send it back the chain to let higher-level components deal with that (if possible).
raymondb -> dvo: Given that I have seen other processors deal with it without failing I assumed that was a case that should be handled. Your comments and further checking have shown that more strict processors will fail on this however they will thorw an error. I think for usability that is important because I imagine most who stumble over this will take some time before they realize the problem.
Redistributing dvo's issues.
the issue is confirmed (OOo doesn't handle the file beginning with whitespace), so this issue should no longer stay unconfirmed. Everything else (whether the issue is closed as invalid/wontfix or OOo will issue a warning or something) is another story.
MIB->DBO,MAV If the manifest.xml is inavlid, then an error should be displayed to the user.
@MAV: IMO the manifest reader ought to reflect that error somehow (e.g. throwing an exception upon readManifestSequence()). As DVO figured out, the parser is ok not triggering anything. But because of the fact that even a document started event is missing, the manifest reader should signal that exception. The packagemanager cannot distinguish whether the file is corrupt or just does not contain any entries.
Setting the target to the issue.
Reset assigne to the default "issues@openoffice.apache.org".