Bug 58617

Summary: Add custom safe XmlBeans type loader / rename vendor specific schema packages
Product: POI Reporter: Andreas Beeker <kiwiwings>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal Keywords: PatchAvailable
Priority: P2    
Version: 3.14-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Update XmlBeans to new poi type loader

Description Andreas Beeker 2015-11-16 22:23:25 UTC
Currently the XmlBeans Factory methods allow parsing of raw data without safe 
limits, i.e. with XmlOption element.
To prevent future usage without the XmlOption element (as I temporarily did 
...), I thought about adding a forbidden-apis check [1],
but this is currently not possible.
So instead I've modified the ooxml-schema sources to point to a custom wrapper 
[2].
I don't think, someone uses the ooxml-schemas without POI, but in this rare 
case they would need to copy&paste [2] into their classes.

Apart of the wrapper, I've added an XsdConfig for the vendor specific schema 
extension.
The former package name was something like schemasMicrosoftComVml or 
schemasMicrosoftComOfficeOffice, ...
now they are called com.microsoft.schemas.vml or 
com.microsoft.schemas.office.office, ...
this goes better along the other similar named packages for Visio or 
encryption/signing.
There are only very few places in the code which reference VML stuff and 
therefore user code shouldn't be affected much.

If no-one objects until 22.11.15, I'll apply that patch.

Andi.

[1] https://github.com/policeman-tools/forbidden-apis/issues/88
[2] org.apache.poi.POIXMLTypeLoader
Comment 1 Andreas Beeker 2015-11-16 22:23:41 UTC
Created attachment 33279 [details]
Update XmlBeans to new poi type loader
Comment 2 Uwe Schindler (ASF) 2015-11-17 09:07:23 UTC
I like this approach, because it allows to use XMLBeans with its default XMLOptions and you are still safe. The only thing you change is that you replace the factory using a regex-replace on all generated xmlbeans java files?
Comment 3 Uwe Schindler (ASF) 2015-11-17 09:13:16 UTC
One question I had is: How do you handle the version numbering in Maven Central? I am not sure if the JAR file names containing the ooxml schema name are handled correctly (sorry for the question, I don't understand the maven deployment here).
Comment 4 Nick Burch 2015-11-17 12:25:15 UTC
The full ooxml-schema files are released infrequently, as and when we decide there needs to be a change. The POI builds (ant, maven pom) + site are updated after one of those to reflect the new full schema jar version number

The smaller poi-ooxml-schemas jar is build from the full one, and shipped every release
Comment 5 Andreas Beeker 2015-11-17 12:42:54 UTC
Although I've tried to keep everything consistent, i.e. updating the ant/maven(sonar) scripts ... I forgot about bumping the version number.
So yes, this is missing and on my todo list before committing.
I'll upload it to central.

I guess, I need a voting process for this, too?

btw. I haven't removed the references to the global XmlOptions, which is now mostly obsolete, so user code is still safe when using the older schema jar.
... but the vendor specific parts (e.g. VML,...) would still fail ...
Comment 6 Andreas Beeker 2015-11-21 20:03:06 UTC
Applied with r1715555
Comment 7 Uwe Schindler (ASF) 2015-11-22 12:01:11 UTC
Hi,
I reviewed the whole stuff. Looks fine. But there is one problem:

You are explicitely excluding the POITypeLoader from the schema.jar file, but this causes the ooxml-schema.jar file not to be used on its own (without poi). In that case it would need a dependency (circular) to POI itsself (also in Maven Central's POM files).

Maybe move the type loader to the schema jar and don't have it in POI? The problem is only that you would have the same class file in multiple JARs, which may make the Ealsticsearch "jarhell" checker crazy (it does not allow same class file names in multiple jars when booting up elasticsearch; to prevent "jar hell").

Second, do you not also need to update the ooxml-security version to 1.1?
Comment 8 Uwe Schindler (ASF) 2015-11-22 12:02:51 UTC
In addition you should use the following pattern instead of a fileset for coping the file:
<copy todir="${xmlbean.sources.dir}">
  <file basedir="${ooxml.src}" file="${ooxml.src}/org/apache/poi/POIXMLTypeLoader.java"/>
</copy>

This fails if file does not exists, whereas the fileset would just do nothing. for single files <file/> should be preferred over <fileset/>.
Comment 9 Andreas Beeker 2015-11-22 13:50:21 UTC
Hi Uwe,

@ooxml-security-1.1:
you are right 

@type loader location:
I don't want to merge the ooxml-schemas and ooxml-security and ideally split 
ooxml-schemas in the format dependent jars in the future.
So currently there's the use case, that someone uses ooxml-schemas and 
ooxml-security together, ending in "jar hell".
I don't think that someone uses ooxml-schema without POI and in that rare case, 
one can copy the POIXMLTypeLoader source into the project sources.

@POI dependency in the maven POM:
I would preferably skip it, because any kind of version-circular-reference 
(e.g. "RELEASE") would cause problems in the user packaging.

Would there be a problem for OSGI packaging? (I don't much about osgi, only 
that's it picky in such conditions...)

@fileset vs. file
ok ... I haven't thought about this. I will change it ... of course you can 
always feel free to change something like this right away ...

Andi.
Comment 10 Uwe Schindler (ASF) 2015-11-22 14:04:21 UTC
> @fileset vs. file
> ok ... I haven't thought about this. I will change it ... of course you can 
> always feel free to change something like this right away ...

Done.
Comment 11 Uwe Schindler (ASF) 2015-11-22 14:10:45 UTC
Hi,

> @POI dependency in the maven POM:
> I would preferably skip it, because any kind of version-circular-reference 
> (e.g. "RELEASE") would cause problems in the user packaging.

The problem would be that you depend on a outdated "older version" of POI indirectly.

One solution would be to have a ooxml "parent jar" with that class which is a dependency of every ooxml package. Not sure if it is worth the trouble.

Another solution is (I am not sure) to add it as "provided" dependency in Maven central (without version!?). But this would not really help users that want to use the ooxml jars alone.

> Would there be a problem for OSGI packaging? (I don't much about osgi, only 
> that's it picky in such conditions...)

OSGI is very picky, but I have no idea... I think it already complains if multiple JAR files use the same package name. In Lucene we refused to create OSGI packages, we dropped support for that completely (especially as OSGI does not work with Java's SPI).
Comment 12 Dominik Stadler 2015-11-22 18:02:49 UTC
Hi, FYI, the Maven build in director sonar seems to fail now, see https://builds.apache.org/job/POI-Maven/161/org.apache.poi$poi-ooxml/console, not sure about some of the changes but it seems the resulting xml-schemas-jar is now empty.
Comment 13 Andreas Beeker 2015-11-22 20:52:46 UTC
When reapplying the patch-file, I've missed the marker-files - now they are 
non-empty and added to svn ... i.e. the build is ok now.