Bug 66878 - Invalid URL entered by user as hyperlink target causes exception when parsing.
Summary: Invalid URL entered by user as hyperlink target causes exception when parsing.
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: OPC (show other bugs)
Version: unspecified
Hardware: Other Linux
: P2 minor (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-11 00:30 UTC by thill
Modified: 2024-03-07 05:26 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description thill 2023-08-11 00:30:27 UTC
An invalid value entered into the document for a hyperlink causes an exception. This is the destination of the hyperlink.

As this is a user entered value, I'm not sure why it should ever be looked at. I don't believe there is any validation, so this field can have any garbage in it. It should be ignored by POI.

Version is whatever comes with Apache Tika 2.7.0


org.apache.poi.openxml4j.opc.PackageRelationshipCollection  - Cannot convert https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.google.com/bigtable/docs/release-notes#December_08_2022 in a valid relationship URI-> dummy-URI used
java.net.URISyntaxException: Illegal character in fragment at index 110: https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.google.com/bigtable/docs/release-notes#December_08_2022
	at java.base/java.net.URI$Parser.fail(URI.java:2974)
	at java.base/java.net.URI$Parser.checkChars(URI.java:3145)
	at java.base/java.net.URI$Parser.parse(URI.java:3189)
	at java.base/java.net.URI.<init>(URI.java:623)
	at org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:723)
	at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:358)
	at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:160)
	at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
	at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:565)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:751)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:322)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:123)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195)
	at org.apache.tika.Tika.parseToString(Tika.java:525)
	at org.apache.tika.Tika.parseToString(Tika.java:495)
	at com.purato.index.documenthandler.TikaDocumentHandler.getText(TikaDocumentHandler.java:52)
Comment 1 大雨哗啦啦 2024-03-07 05:22:24 UTC
(In reply to thill from comment #0)
> An invalid value entered into the document for a hyperlink causes an
> exception. This is the destination of the hyperlink.
> 
> As this is a user entered value, I'm not sure why it should ever be looked
> at. I don't believe there is any validation, so this field can have any
> garbage in it. It should be ignored by POI.
> 
> Version is whatever comes with Apache Tika 2.7.0
> 
> 
> org.apache.poi.openxml4j.opc.PackageRelationshipCollection  - Cannot convert
> https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.
> google.com/bigtable/docs/release-notes#December_08_2022 in a valid
> relationship URI-> dummy-URI used
> java.net.URISyntaxException: Illegal character in fragment at index 110:
> https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.
> google.com/bigtable/docs/release-notes#December_08_2022
> 	at java.base/java.net.URI$Parser.fail(URI.java:2974)
> 	at java.base/java.net.URI$Parser.checkChars(URI.java:3145)
> 	at java.base/java.net.URI$Parser.parse(URI.java:3189)
> 	at java.base/java.net.URI.<init>(URI.java:623)
> 	at
> org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.
> java:723)
> 	at
> org.apache.poi.openxml4j.opc.PackageRelationshipCollection.
> parseRelationshipsPart(PackageRelationshipCollection.java:358)
> 	at
> org.apache.poi.openxml4j.opc.PackageRelationshipCollection.
> <init>(PackageRelationshipCollection.java:160)
> 	at
> org.apache.poi.openxml4j.opc.PackageRelationshipCollection.
> <init>(PackageRelationshipCollection.java:130)
> 	at
> org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:
> 565)
> 	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:751)
> 	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:322)
> 	at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.
> parse(OOXMLExtractorFactory.java:123)
> 	at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:
> 115)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195)
> 	at org.apache.tika.Tika.parseToString(Tika.java:525)
> 	at org.apache.tika.Tika.parseToString(Tika.java:495)
> 	at
> com.purato.index.documenthandler.TikaDocumentHandler.
> getText(TikaDocumentHandler.java:52)

This just a error log which catch by itself.
The comment in code description like this:

// when parsing of the given uri fails, we can either
                    // ignore this relationship, which leads to IllegalStateException
                    // later on, or use a dummy value and thus enable processing of the
                    // package

In it's org.apache.poi.ooxml.openxmlj4.opc.PackageRelationshipCollection class,
at line 357.
Comment 2 大雨哗啦啦 2024-03-07 05:24:08 UTC
// Target converted in URI
                URI target = PackagingURIHelper.toURI("http://invalid.uri"); // dummy url
                String value = element.getAttribute(PackageRelationship.TARGET_ATTRIBUTE_NAME);
                try {
                    // when parsing of the given uri fails, we can either
                    // ignore this relationship, which leads to IllegalStateException
                    // later on, or use a dummy value and thus enable processing of the
                    // package
                    target = PackagingURIHelper.toURI(value);
                } catch (URISyntaxException e) {
                    LOG.atError().withThrowable(e).log("Cannot convert {} in a valid relationship URI-> dummy-URI used", value);
                }
Comment 3 大雨哗啦啦 2024-03-07 05:26:41 UTC
this uselly happend,when your excel cell value is contains "@",or some hyperlink.