Bug 62564

Summary: A classloader issue looking up org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
Product: POI Reporter: Karl Wright <kwright>
Component: POI OverallAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 3.17-FINAL   
Target Milestone: ---   
Hardware: Other   
OS: Linux   

Description Karl Wright 2018-07-24 11:11:29 UTC
Please see the following stack trace:

org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) ~[tika-core-1.17.jar:1.17]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.17.jar:1.17]
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.17.jar:1.17]
        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[mcf-tika-connector.jar:?]
        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?]
Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) ~[?:?]
        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]
       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
        ... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_171]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]
        at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) ~[?:?]
        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ~[?:?]
        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
        ... 12 more


This occurs in the context of Apache ManifoldCF, which is using Tika to extract document content from a variety of different document types, which in turn is using Apache POI.  ManifoldCF uses a fairly common tiered classloader setup, and has all the POI jars at the same level along with many others.  The basic expectation is that the classloader that was used to load a class should be used to load any class it depends on, whether directly or via reflection.  But unfortunately it appears that there's a place in POI where this isn't honored, and instead the wrong class loader is used.

A similar ticket was opened about a year ago for the POI project, related to how xmlbeans did its reflection class loading, and that was successfully resolved.  This seems to be another similar situation.

The place where this occurred was in a production system whose documents I do not have access to.  It would be good to know what type of document it was as well.
Comment 1 Andreas Beeker 2018-07-24 22:06:02 UTC
I think the old ticket was #61478

I guess we don't need the document causing this issue, as we can easily provide an encrypted file.
Similar to last time we/I need a testbed to see why the thread ContextClassloader can't access the poi-ooxml classes. I haven't yet checked if #61478 testbed also provokes that error.
If not, I would need to fiddle around with ManifoldCF first.
If you have a ready to run example at hand, that would make my life easier.

We have a few references to Thread.currentThread().getContextClassLoader() and it seems we need to find workarounds for OSGi based applications:
https://stackoverflow.com/questions/2198928/better-handling-of-thread-context-classloader-in-osgi
Comment 2 Andreas Beeker 2018-07-24 22:18:51 UTC
Another SO link about context class loader:
https://stackoverflow.com/a/36228195/2066598

... "Short answer: never use the context class loader!"

So we should also add it to the forbidden-apis-check.
Comment 3 Andreas Beeker 2018-07-24 23:00:54 UTC
Replaced calls to ContextClassLoader with getClass().getClassLoader() via r1836590

Please check, if the next successful nightly after 25.07.2018 is working for you:
https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
Comment 4 Karl Wright 2018-07-25 11:07:52 UTC
Thanks for the quick turnaround!

I'll have the user try this out.
Comment 5 Karl Wright 2018-07-26 13:00:48 UTC
The user picked up the 4.0.0-SNAPSHOT build as instructed, but still reports the following exception:

Error tossed: org/apache/poi/POIXMLTextExtractor
java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]
        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

This is a Tika classloading issue which I will need to chase down as a Tika ticket, unfortunately.  But the fix did seem to address the original problem.
Comment 6 PJ Fanning 2018-07-26 13:38:45 UTC
Any chance you could use the org.apache.poi.POIXMLTextExtractor directly instead of using via Tika? That would simplify your classloading setup.
Comment 7 Karl Wright 2018-07-26 13:57:24 UTC
I can't see how.  Tika depends on POI, but ManifoldCF does not.  The problems we see are because we're trying to use Tika.
Comment 8 Andreas Beeker 2018-07-26 20:34:35 UTC
Sorry to mention it now, but the Tika -> POI 4.0.0 (nightly) integration doesn't work, because of #62355