Bug 63846 - Handle invalid characters in relationship .rels
Summary: Handle invalid characters in relationship .rels
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: OPC (show other bugs)
Version: 4.1.x-dev
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-14 19:30 UTC by Andreas Beeker
Modified: 2019-11-02 17:14 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Beeker 2019-10-14 19:30:28 UTC
I was trying to figuring out rendering issues and converted the file 53446.ppt to .pptx via Powerpoint 2016.
The resulting file contains tab characters (0x07) in the relationship elements of slide11.xml.rels.
The file can be opened by Libre office.

I'm not sure on how to react on this bug - either we find a way on telling the sax parser that this is ok or we preparse/filter the files ... or we simply accept, that we can't parse those files ...

The converted file can be found here:
http://people.apache.org/~kiwiwings/bugs/53446.pptx


> org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 911; An invalid XML character (Unicode: 0x7) was found in the value of attribute "Target" and element is "Relationship".
> 	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
> 	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:399)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:326)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1471)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:979)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:447)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:250)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2706)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:601)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
> 	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:531)
> 	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:887)
> 	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:823)
> 	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
> 	at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
> 	at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
> 	at java.xml/javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:122)
> 	at poi.ooxml@4.1.0/org.apache.poi.ooxml.util.DocumentHelper.readDocument(DocumentHelper.java:166)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:304)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:163)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:133)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:570)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725)
> 	at poi.ooxml@4.1.0/org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
> 	at kiwiwings.poivisualizer/de.kiwiwings.poi.visualizer.treemodel.opc.OPCTreeModel.load(OPCTreeModel.java:52)
Comment 1 Dominik Stadler 2019-11-02 14:28:01 UTC
Technically this document is broken and not valid XML, also other xml-tools complain about it:

$ xmllint ppt/slides/_rels/slide11.xml.rels
ppt/slides/_rels/slide11.xml.rels:2: parser error : invalid character in attribute value
t%20&amp;%20Cntrl\DASHTEMPLATE.doc#_Hlk440858950        1,3370,3673,0,,      APPROVALS:
                                                                                      ^
...

so it will be hard to convince the standard-complying XmlParser in Java to accept it.
Comment 2 Andreas Beeker 2019-11-02 17:14:04 UTC
I haven't tried it yet, but in Xml 1.1 I think it's possible to parse those chars.

see https://stackoverflow.com/a/28152666/2066598