Bug 61478 - POI OOXML-Schema lookup uses wrong classloader
Summary: POI OOXML-Schema lookup uses wrong classloader
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: OPC (show other bugs)
Version: 3.17-dev
Hardware: All All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-02 08:48 UTC by Karl Wright
Modified: 2017-09-05 21:53 UTC (History)
0 users



Attachments
Testbed for exercising POI classloader (27.85 KB, application/x-zip-compressed)
2017-09-02 12:21 UTC, Karl Wright
Details
User stack trace with all jars loaded by connector classloader (809.77 KB, image/jpeg)
2017-09-02 14:00 UTC, Karl Wright
Details
Image of user stacktrace when poi-ooxml-schema.jar moved to root classloader (621.11 KB, image/jpeg)
2017-09-02 14:02 UTC, Karl Wright
Details
Get Classloader from SchemaType instead of ContextClassLoader/preset CL (8.88 KB, patch)
2017-09-03 20:00 UTC, Andreas Beeker
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Karl Wright 2017-09-02 08:48:16 UTC
When poi-ooxml uses reflection to locate classes in poi-ooxml-schema, it apparently does not use the classloader of the current class or thread to do this.  As a result, you cannot get poi-ooxml to work properly in a non-root classloader.

Furthermore, even if you move poi-ooxml-schema to the root level, it has callbacks into poi-ooxml, so you are forced to move THAT jar to root level as well.  And once you do that, poi-ooxml calls back to org.apache.poi.util, which is in poi.jar, so that jar also needs to be run at root level.

(A side note: having poi and poi-ooxml and poi-ooxml-schemas be separate makes little sense if they all depend on each other in this way.)

We discovered this trying to integrate the latest Tika (which uses POI version 3.9) with ManifoldCF.  ManifoldCF runs tika-parsers at the connector level, which has its own classloader.  We were forced to move all of POI, and its dependencies, to the root classloader level, which greatly increases the size of our binary image.
Comment 1 Nick Burch 2017-09-02 09:40:12 UTC
Any chance you could work up a small junit unit test (probably with dummy classloader) to show the problem, and/or a fix?
Comment 2 Karl Wright 2017-09-02 10:27:43 UTC
I verified that this occurs also with POI 3.15.

I should be able to come up with a classloader code snippet that demonstrates the problem.  It will occur when trying to parse any Windows Office file, e.g. xlsx or docx.
Comment 3 Karl Wright 2017-09-02 10:52:58 UTC
I looked at providing an example but unfortunately, this occurs under the execution of Tika, which has many dozen dependent jars. If you want a code snippet, you'll either need to set up directory with all Tika dependent jars in it, or you will need to provide me a snippet of code which parses a Microsoft Office file.  Alternatively I can upload a many-megabyte zip file containing the Tika parser with all dependencies that you can just unpack.  Please let me know what you prefer.

Another way forward is to discuss how you use reflection in POI.  If you use the following method to locate your classes, all should be well:

https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#forName-java.lang.String-

But if you use this method, then you will have to be very certain you know what you are doing to get the right class loader:

https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#forName-java.lang.String-boolean-java.lang.ClassLoader-

I suspect it is the latter, and perhaps you are using the thread class loader rather than the current class's class loader?
Comment 4 Nick Burch 2017-09-02 11:24:01 UTC
Does using XWPFWordExtractor <http://poi.apache.org/apidocs/org/apache/poi/xwpf/extractor/XWPFWordExtractor.html> trigger the problem in your environment? That's the easiest way to do roughly what Tika does in just a few lines

XWPFWordExtractor doc = new XWPFWordExtractor(OPCPackage.open("input.docx"));
doc.getText();
doc.close();

If not, just send over a unit test that triggers the problem with Tika, we'll pop the test in the Tika codebase + fix here + check with Tika after our next release

(There's a good overlap in POI and Tika committers!)
Comment 5 Karl Wright 2017-09-02 12:21:11 UTC
Created attachment 35277 [details]
Testbed for exercising POI classloader

This testbed needs xmlbeans-2.6.0.jar, poi-3.15.jar, poi-ooxml-3.15.jar, and poi-ooxml-schemas-3.15.jar added to the lib directory after unzipping.
Comment 6 Andreas Beeker 2017-09-02 12:22:01 UTC
Please check if #60226, which was applied after 3.15, makes any difference
Comment 7 Karl Wright 2017-09-02 12:30:01 UTC
I've uploaded the testbed.  The testbed does not seem to cause the failure we were seeing, however.

If the codepath for a docx document goes through poi-ooxml-schemas, that means that we're seeing something that's being triggered somehow by Tika.

In the MCF setup, we have tika-core at the root level, and tika-parsers (and most of its dependencies, including poi) at the connector level.  This has worked in the past, at least until POI started using reflection to look up classes in poi-ooxml-schemas.

The classloader setup I'm using in the testbed is cribbed directly from ManifoldCF classes that set the class loaders up, so that's clearly not the issue.  Any ideas?
Comment 8 PJ Fanning 2017-09-02 13:13:18 UTC
Could you provide some stack traces?
Comment 9 Karl Wright 2017-09-02 14:00:52 UTC
Created attachment 35278 [details]
User stack trace with all jars loaded by connector classloader

Image of user stacktrace when all Tika parser dependencies (including POI) loaded with connector classpath attached.
Comment 10 Karl Wright 2017-09-02 14:02:38 UTC
Created attachment 35279 [details]
Image of user stacktrace when poi-ooxml-schema.jar moved to root classloader

This is the user stacktrace when poi-ooxml-schema and poi-ooxml both moved to root classloader.
Comment 11 Karl Wright 2017-09-02 14:04:26 UTC
Mr. Beeker, I cannot readily modify Tika to call the new method.
Comment 12 PJ Fanning 2017-09-02 15:44:42 UTC
Could you call the setClassLoader method before calling Tika code?
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/POIXMLTypeLoader.java?view=markup&pathrev=1763922
Comment 13 Karl Wright 2017-09-02 16:01:10 UTC
Is the patched POI binary available on Maven yet?
Comment 14 PJ Fanning 2017-09-02 16:04:01 UTC
Try poi 3.16 or 3.17-beta1. Both are in maven central.
Comment 15 Karl Wright 2017-09-02 16:19:12 UTC
OK -- I am certain that this workaround would solve the problem.  But it is pretty ugly, and we already have a workaround implemented and a patch release is up for a vote.

May I ask if the POI team attends to address this in a more official manner?  If not, perhaps the Tika team should?  If neither team wishes to address the issue, I will put this fix into ManifoldCF.
Comment 16 Andreas Beeker 2017-09-02 21:38:15 UTC
> But it is pretty ugly, and we already have a workaround implemented and a patch release is up for a vote.

Could you recommend how to use/identify the correct classloader?
(as we need access the .xsb files, it's neither the thread context classload, nor the classloader of any generated OOXML xml beans class)

As I had not much of an idea what the OSGi classloader does, I thought that this is the KISS way to solve the problem, but I'm happy to learn ... and if you ask nicely, we could postpone our final version which I wanted to prepare the release candidate today. 

> May I ask if the POI team attends to address this in a more official manner?

Official manner, as in a technical solution, if you know how, we will adapt it.
Otherwise it's officially in our FAQ:

http://poi.apache.org/faq.html#faq-N1029C

> We discovered this trying to integrate the latest Tika (which uses POI version 3.9) ...

Seriously? I always rant about users not able to use a recent version, but I never would
imagine that a PMC chair would write something like that
Comment 17 Karl Wright 2017-09-02 21:59:07 UTC
>>>
Seriously? I always rant about users not able to use a recent version, but I never would
imagine that a PMC chair would write something like that
<<<

My apologies -- the actual POI version we were using was 3.15, not 3.9, as I explained elsewhere.  I hope that is clear now.

>>>
Could you recommend how to use/identify the correct classloader?
<<<

Yes, as I explained above, what you really want to emulate is what happens when you do Class.forName(String classname).  There is a Class.forName() variant which accepts a passed-in class loader, which is what you use.  So you need to do this:

>>>
Invoking this method is equivalent to:
Class.forName(className, true, currentLoader)
where currentLoader denotes the defining class loader of the current class.
<<<

The defining class loader of the current class is:

xxx.Class.getClassLoader();

That should be the default behavior, I believe.  What do you think?
Comment 18 Andreas Beeker 2017-09-02 22:34:29 UTC
Just a short feedback ... as I'm not sure, if I can finish it this weekend.
When I tested #60226 / #57857, I used a sample project [1]
Additional to Glucazeaus project, one needs the additional service mix patch in #57857

If the Class.forName approach works in this and manifoldCF, it should be hopefully sufficient to render the workaround obsolete.


[1] https://github.com/glucazeau/test-poi-sling
Comment 19 Greg Woolsey 2017-09-03 05:32:48 UTC
From the attached stacktrace, and the sources, it can be seen that the error originates from 

org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument.Factory.parse(*)

which is found in

ooxml-schemas-1.3.jar

This is a generated XMLBeans class.  Further, it isn't doing any explicit class loading, it is just importing a class.  So it looks to me like the issue is down in XMLBeans somewhere, or in the user's ClassLoader hierarchy.  

I searched the entire POI codebase, and there are no calls to *.forName(String, ClassLoader) or anything similar.  All calls are using the static Class.forName(String classname) method, which should be fine.

I'm fairly certain the issue for the end user is the placement of ooxml-schemas-1.3.jar in their class loader library structure.  It could be defined at too high a level, and thus inaccessible to a lower classloader, in which case that library should be pushed down rather than POI pulled up.  Note that this is a standard dependency for POI.
Comment 20 Greg Woolsey 2017-09-03 05:41:48 UTC
Argh, I should go to bed.  I see now the issue was calling from DocumentDocument.Factory in ooxml-schemas-1.3.jar to POIXMLTypeLoader in poi-ooxml.jar.

So it still could be the placement of ooxml-schemas-1.3.jar, but in the reverse case - it could be that the client code can see that jar, but that jar can't see POI.  It could be that ooxml-schemas.1.3.jar is loaded too early, and thus can't see POI.  Perhaps some library at a higher-order classloader also loads that jar somehow.

(In reply to Greg Woolsey from comment #19)
> From the attached stacktrace, and the sources, it can be seen that the error
> originates from 
> 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument.
> Factory.parse(*)
> 
> which is found in
> 
> ooxml-schemas-1.3.jar
> 
> This is a generated XMLBeans class.  Further, it isn't doing any explicit
> class loading, it is just importing a class.  So it looks to me like the
> issue is down in XMLBeans somewhere, or in the user's ClassLoader hierarchy.
> 
> 
> I searched the entire POI codebase, and there are no calls to
> *.forName(String, ClassLoader) or anything similar.  All calls are using the
> static Class.forName(String classname) method, which should be fine.
> 
> I'm fairly certain the issue for the end user is the placement of
> ooxml-schemas-1.3.jar in their class loader library structure.  It could be
> defined at too high a level, and thus inaccessible to a lower classloader,
> in which case that library should be pushed down rather than POI pulled up. 
> Note that this is a standard dependency for POI.
Comment 21 Karl Wright 2017-09-03 08:37:42 UTC
Hi Greg,

As I've explained before, there are exactly two classloader levels in ManifoldCF - the root level, and the "connectors" level.  In previous (working) Tika integrations, tika-core was at the root level, and tika-parsers (and all its dependencies, including xmlbeans) was at the "connectors" level.

When we went to the most recent version of Tika we discovered that having poi-ooxml-schemas at the higher level meant that the Tika connector could not find anything in poi-ooxml-schemas, even though that also was at the higher level, as was xmlbeans.  We needed to move ALL poi jars and their dependencies to the lower level in order to get it to work again.  (I'm in the process of voting on a point release for this as we speak so that our users will not be impacted.)

We'd much prefer having these jars once again live at the higher level, since putting them at the lower level means they must be included multiple times (in multiple webapps etc.), and that bloats our binary considerably.

Hope this helps.
Comment 22 Karl Wright 2017-09-03 08:56:54 UTC
If there are no invocations of Class.forName(String, boolean, ClassLoader), then there's no way the POI libraries can be a problem.  But a glance at the code shows that that's not entirely true; the POIXMLTypeLoader class does the latter.

It may be the case that the only reason for that code was written was to work around this problem when it was discovered by others (e.g. #60226).  But, in that case, the problem might well be that some other dependency, e.g. xmlbeans, is doing the wrong thing and you guys need to request a fix from them.

Here are some data points:

- Running all poi jars and their dependencies at the "connectors" level with poi-3.15 *definitely* uses the wrong classloader at some point -- probably the thread classloader
- The testbed I constructed and uploaded, on the other hand, *succeeds* - and that setup mimics MCF's classloader setup, but without Tika in between the MCF "connector" and the poi jars

Maybe the right approach is to analyze the difference in code paths between these two ways of calling into poi and see what differences there are, if any?  The stack traces are helpful here, yes, but maybe also looking at the testbed I uploaded could provide some insight into a way of getting into poi that does not seem to have the problem?

The only thing I'm pretty sure of is that it probably isn't Tika that does this, because it *does* manage to find the poi classes, just not those in poi-ooxml-schemas.
Comment 23 Andreas Beeker 2017-09-03 20:00:10 UTC
Created attachment 35283 [details]
Get Classloader from SchemaType instead of ContextClassLoader/preset CL

I've tested that patch against the OSGi example mentioned above (the latest servicemix bundle for POI 3.16 is also incomplete as far as I can say).
I wasn't sure (and I'm still not), if the cached SchemaTypeLoader would result  in a failing resource loading. At least I've added some kind of OOM-check to see if the ThreadLocals would result in memory leaks.
Comment 24 Andreas Beeker 2017-09-03 20:02:17 UTC
I should add ... if the schema type loader is not cached, this will result in OOM in the tests.
Comment 25 Andreas Beeker 2017-09-04 20:19:47 UTC
Karl (or should I say Mr. Wright?), I'll wait for your response, before I commit the patch and release(-candidate) POI 3.17 final - the last version to support Java 6.

Please apply the patch to the POI trunk, build it and give us feedback if it works for you.
Comment 26 Karl Wright 2017-09-04 23:04:38 UTC
Will do, as time permits.  Hopefully you should have an answer by tomorrow evening at the latest.
Comment 27 Karl Wright 2017-09-05 00:09:48 UTC
I was able to confirm that this fix corrected the problem in the MCF environment with Tika extraction.

Thanks to all for your help on this!

When do you expect this code to "go live" and be available for download from Maven?
Comment 28 Nick Burch 2017-09-05 00:22:49 UTC
Re-opening the bug, as I think Andreas still needs to commit the proposed fix to svn!

Release wise - we're holding off doing the 3.17 release candidate for this fix, as we want to play nicely with other ASF projects :)
Comment 29 Andreas Beeker 2017-09-05 21:48:50 UTC
Applied via r1807418