Please see the following stack trace: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) ~[tika-core-1.17.jar:1.17] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.17.jar:1.17] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.17.jar:1.17] at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[mcf-tika-connector.jar:?] at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?] Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) ~[?:?] at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] ... 12 more Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_171] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171] at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) ~[?:?] at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ~[?:?] at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] ... 12 more This occurs in the context of Apache ManifoldCF, which is using Tika to extract document content from a variety of different document types, which in turn is using Apache POI. ManifoldCF uses a fairly common tiered classloader setup, and has all the POI jars at the same level along with many others. The basic expectation is that the classloader that was used to load a class should be used to load any class it depends on, whether directly or via reflection. But unfortunately it appears that there's a place in POI where this isn't honored, and instead the wrong class loader is used. A similar ticket was opened about a year ago for the POI project, related to how xmlbeans did its reflection class loading, and that was successfully resolved. This seems to be another similar situation. The place where this occurred was in a production system whose documents I do not have access to. It would be good to know what type of document it was as well.
I think the old ticket was #61478 I guess we don't need the document causing this issue, as we can easily provide an encrypted file. Similar to last time we/I need a testbed to see why the thread ContextClassloader can't access the poi-ooxml classes. I haven't yet checked if #61478 testbed also provokes that error. If not, I would need to fiddle around with ManifoldCF first. If you have a ready to run example at hand, that would make my life easier. We have a few references to Thread.currentThread().getContextClassLoader() and it seems we need to find workarounds for OSGi based applications: https://stackoverflow.com/questions/2198928/better-handling-of-thread-context-classloader-in-osgi
Another SO link about context class loader: https://stackoverflow.com/a/36228195/2066598 ... "Short answer: never use the context class loader!" So we should also add it to the forbidden-apis-check.
Replaced calls to ContextClassLoader with getClass().getClassLoader() via r1836590 Please check, if the next successful nightly after 25.07.2018 is working for you: https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
Thanks for the quick turnaround! I'll have the user try this out.
The user picked up the 4.0.0-SNAPSHOT build as instructed, but still reports the following exception: Error tossed: org/apache/poi/POIXMLTextExtractor java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?] at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?] This is a Tika classloading issue which I will need to chase down as a Tika ticket, unfortunately. But the fix did seem to address the original problem.
Any chance you could use the org.apache.poi.POIXMLTextExtractor directly instead of using via Tika? That would simplify your classloading setup.
I can't see how. Tika depends on POI, but ManifoldCF does not. The problems we see are because we're trying to use Tika.
Sorry to mention it now, but the Tika -> POI 4.0.0 (nightly) integration doesn't work, because of #62355