An invalid value entered into the document for a hyperlink causes an exception. This is the destination of the hyperlink. As this is a user entered value, I'm not sure why it should ever be looked at. I don't believe there is any validation, so this field can have any garbage in it. It should be ignored by POI. Version is whatever comes with Apache Tika 2.7.0 org.apache.poi.openxml4j.opc.PackageRelationshipCollection - Cannot convert https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.google.com/bigtable/docs/release-notes#December_08_2022 in a valid relationship URI-> dummy-URI used java.net.URISyntaxException: Illegal character in fragment at index 110: https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud.google.com/bigtable/docs/release-notes#December_08_2022 at java.base/java.net.URI$Parser.fail(URI.java:2974) at java.base/java.net.URI$Parser.checkChars(URI.java:3145) at java.base/java.net.URI$Parser.parse(URI.java:3189) at java.base/java.net.URI.<init>(URI.java:623) at org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:723) at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:358) at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:160) at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130) at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:565) at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:751) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:322) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:123) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195) at org.apache.tika.Tika.parseToString(Tika.java:525) at org.apache.tika.Tika.parseToString(Tika.java:495) at com.purato.index.documenthandler.TikaDocumentHandler.getText(TikaDocumentHandler.java:52)
(In reply to thill from comment #0) > An invalid value entered into the document for a hyperlink causes an > exception. This is the destination of the hyperlink. > > As this is a user entered value, I'm not sure why it should ever be looked > at. I don't believe there is any validation, so this field can have any > garbage in it. It should be ignored by POI. > > Version is whatever comes with Apache Tika 2.7.0 > > > org.apache.poi.openxml4j.opc.PackageRelationshipCollection - Cannot convert > https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud. > google.com/bigtable/docs/release-notes#December_08_2022 in a valid > relationship URI-> dummy-URI used > java.net.URISyntaxException: Illegal character in fragment at index 110: > https://cloud.google.com/bigtable/docs/backups#what-for%20https://cloud. > google.com/bigtable/docs/release-notes#December_08_2022 > at java.base/java.net.URI$Parser.fail(URI.java:2974) > at java.base/java.net.URI$Parser.checkChars(URI.java:3145) > at java.base/java.net.URI$Parser.parse(URI.java:3189) > at java.base/java.net.URI.<init>(URI.java:623) > at > org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper. > java:723) > at > org.apache.poi.openxml4j.opc.PackageRelationshipCollection. > parseRelationshipsPart(PackageRelationshipCollection.java:358) > at > org.apache.poi.openxml4j.opc.PackageRelationshipCollection. > <init>(PackageRelationshipCollection.java:160) > at > org.apache.poi.openxml4j.opc.PackageRelationshipCollection. > <init>(PackageRelationshipCollection.java:130) > at > org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java: > 565) > at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:751) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:322) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory. > parse(OOXMLExtractorFactory.java:123) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java: > 115) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195) > at org.apache.tika.Tika.parseToString(Tika.java:525) > at org.apache.tika.Tika.parseToString(Tika.java:495) > at > com.purato.index.documenthandler.TikaDocumentHandler. > getText(TikaDocumentHandler.java:52) This just a error log which catch by itself. The comment in code description like this: // when parsing of the given uri fails, we can either // ignore this relationship, which leads to IllegalStateException // later on, or use a dummy value and thus enable processing of the // package In it's org.apache.poi.ooxml.openxmlj4.opc.PackageRelationshipCollection class, at line 357.
// Target converted in URI URI target = PackagingURIHelper.toURI("http://invalid.uri"); // dummy url String value = element.getAttribute(PackageRelationship.TARGET_ATTRIBUTE_NAME); try { // when parsing of the given uri fails, we can either // ignore this relationship, which leads to IllegalStateException // later on, or use a dummy value and thus enable processing of the // package target = PackagingURIHelper.toURI(value); } catch (URISyntaxException e) { LOG.atError().withThrowable(e).log("Cannot convert {} in a valid relationship URI-> dummy-URI used", value); }
this uselly happend,when your excel cell value is contains "@",or some hyperlink.