Bug 59268 - Work on providing an updated version of XMLBeans
Summary: Work on providing an updated version of XMLBeans
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.15-dev
Hardware: PC Windows NT
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
: 54084 58247 59195 59428 61494 61921 61988 62004 (view as bug list)
Depends on: 61949
Blocks: 54084 55149 58247 58925 59195 59428 61494
  Show dependency tree
Reported: 2016-04-04 07:27 UTC by Dominik Stadler
Modified: 2018-07-09 19:41 UTC (History)
8 users (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Stadler 2016-04-04 07:27:30 UTC
We have a number of issues with the current version of xmlbeans, but unfortunately the project is not maintained any more. We should try to do a non-maintainer fix-upload from the attic. Let's start discussing this.

Information about attic non-maintainer uploads:
* Discussion should be started on the general-mailing list: general@attic.apache.org, http://mail-archives.apache.org/mod_mbox/attic-general/
* Page for XMLBeans: http://attic.apache.org/projects/xmlbeans.html
* JIRA: https://issues.apache.org/jira/browse/ATTIC/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

Current issues with the XMLBeans jar:

* It contains duplicate classes which causes issues on Android, see e.g. https://github.com/andruhon/android5xlsx
* It includes classes from package javax which causes issues in Application servers, see e.g. http://stackoverflow.com/questions/18507605/caused-by-java-lang-linkageerror-loader-constraint-violation-when-resolving-m/36396611#36396611
Comment 1 Dominik Stadler 2016-04-04 07:29:17 UTC
Comment from Nick in bug #59195: "Maybe we could get together with Tika and Lucene, and ask the Attic PMC to let us put together a 2.6.1 release with the packaging fix and some others, if someone fancies spending a little bit of time to lead that?"
Comment 2 Dominik Stadler 2016-04-04 08:00:51 UTC
Bonus points for looking at the following locking issue as well: http://apache-poi.1045710.n5.nabble.com/Alternative-Replacement-for-xmlbeans-tt5722053.html
Comment 3 Dominik Stadler 2016-04-04 08:02:32 UTC
Another thing that we can look at: bug 55149 - "Usage of XmlBeans triggers "clearThreadLocalMap" warnings"
Comment 5 Dominik Stadler 2016-04-15 16:14:13 UTC
The javax... classes are actually included via the stax-api dependency of XMLBeans. This one should probably simply be removed together with depending on Java 6 or higher.
Comment 6 Dominik Stadler 2016-05-04 14:54:55 UTC
*** Bug 59428 has been marked as a duplicate of this bug. ***
Comment 7 Dominik Stadler 2016-08-13 20:45:42 UTC
*** Bug 59195 has been marked as a duplicate of this bug. ***
Comment 8 Dominik Stadler 2017-05-27 21:01:40 UTC
We also see some bugs related to Unicode handling inside XmlBeans, these are also candidates for this effort, see e.g. bug 54084.
Comment 9 PJ Fanning 2017-06-04 12:24:28 UTC
I had a little trouble building xmlbeans so I made some small changes and committed them to https://github.com/pjfanning/xmlbeans.
Comment 10 PJ Fanning 2017-06-04 23:45:26 UTC
I made a change on a fork that I have of xmlbeans and it allows the testBug54084Unicode to pass (https://github.com/pjfanning/xmlbeans/commit/b4dda3837421835b7a378c4ab83ce48a4c49fe59).

I have other branches on the xmlbeans fork where I go further and start removing Piccolo parser, the JSR173 classes and the deprecated code supporting XMLInputStream). These changes break 2 CDATA xmlbeans unit tests but this doesn't seem to adversely affect the poi usage. They can probably be fixed without too much effort.
The removal of the Piccolo parser would allow some tidy up in POI code (as it has some workarounds to avoid using Piccolo).
Comment 11 PJ Fanning 2017-06-24 09:25:45 UTC
Do we think it would be possible to issue an xmlbeans patch based on the changes I've made in github?
The longer term solution is probably to replace xmlbeans.
Comment 12 PJ Fanning 2017-06-29 18:22:42 UTC
I have published a patched version of xmlbeans to Maven Central.

I have a sample that shows the patched jar in action at https://github.com/pjfanning/poi-xmlbeans-patch-test
Comment 13 Nick Burch 2017-06-29 21:53:30 UTC
It'd be good to go through the xmlbeans bugzilla, and see if there are any other bug fix patches there that'd be worth including in a fix release.

(Not to mention the duplicated classes in the 2.6.0 binary jar that ought to be solved by a fresh build!)
Comment 14 PJ Fanning 2017-06-29 22:44:51 UTC
My jar has the https://issues.apache.org/jira/browse/XMLBEANS-499 issue with duplicate ReferenceResolver classes. I'll see if I can work out what in the xmlbeans build is causing this.
Comment 15 PJ Fanning 2017-06-29 23:32:24 UTC
I added a fix for the duplicate classes in https://github.com/pjfanning/xmlbeans - I will publish a 2.6.2 jar in the coming days if there are no other xmlbeans issues to fix.

Feel free to highlight any issues from XMLBeans in the issue tracker for the github fork.
Comment 16 Dominik Stadler 2017-06-30 21:06:49 UTC
Thanks for the work, I have tested it a bit and so far it looks quite good. 

One thing we could try is to remove the stax-api dependency from the Maven deployment, I don't think this is still necessary with Java 6 or newer.

As far as I see the other most pressing items are in place now and I would rather release a first piece instead of adding more aggressive stuff, we can always do this later in a second release. 

Should we start discussion on general@attic.apache.org on required next steps for publishing an official bugfix-release of xmlbeans?

In fact with your release on Maven Central we could even technically switch to this without a release from the attic, but it would probably require some due diligence on licensing of the additional patches, ...
Comment 17 Nick Burch 2017-06-30 21:17:21 UTC
I think best practice would be to speak with the Attic PMC about this
Comment 18 PJ Fanning 2017-07-03 19:34:36 UTC
I agree that getting an official apache xmlbeans patch done is preferable to having people use my forked jar. This jar solves some issues with writing XSSF workbooks that have unicode surrogate chars.

I have found a related issue in the SXSSF code base. https://bz.apache.org/bugzilla/show_bug.cgi?id=61246 This means that for writing SXSSF workbooks, users affected by the unicode surrogate chars issues will need the fork xmlbeans jar and the latest POI code (or 3.17 beta2 when it is released).
Comment 19 Dominik Stadler 2017-09-21 16:10:10 UTC
*** Bug 61494 has been marked as a duplicate of this bug. ***
Comment 20 Dominik Stadler 2017-09-21 16:11:47 UTC
*** Bug 58247 has been marked as a duplicate of this bug. ***
Comment 21 Dominik Stadler 2017-09-21 16:13:25 UTC
*** Bug 54084 has been marked as a duplicate of this bug. ***
Comment 22 Scott Coldwell 2017-12-11 18:37:10 UTC
I think the more appropriate course of action would be to remove xmlbeans as a dependency.  The project was terminated over 4 years ago.
Comment 23 Nick Burch 2017-12-11 19:07:28 UTC
(In reply to Scott Coldwell from comment #22)
> I think the more appropriate course of action would be to remove xmlbeans as
> a dependency.  The project was terminated over 4 years ago.

Removing XMLBeans is probably something like 3-6 months of work, and would have the added disadvantage of probably breaking something like half of all the POI examples for XLSX / PPTX / DOCX on the internet, and probably most of the big projects which make deep/custom use of POI for XLSX/PPTX/DOCX processing....

The first step, for which volunteers are very very much welcome(!) is to locate all the places in your favourite bit of POI (eg XSSF) that leak the CT* xmlbeans objects, search public examples / github projects / mailing lists etc to see how those beans are used, ensure we have a proper Usermodel wrapper for doing the same thing "properly", then deprecate that CT accessor. Once everything that's commonly done via the low-level xmlbeans classes in other people's code can be done cleanly with proper POI classes, we can get people to migrate, then we're in a position to swap out the POI innards to use something else. 

It's quite a lot of work though, which is why more volunteers are needed to help drive it! :)
Comment 24 Greg Woolsey 2017-12-12 17:07:55 UTC
Another complication is that none of the replacement technologies that have been discussed so far can handle the arbitrary XML allowed as "extensions" in the OOXML spec. One benefit of XMLBeans is that it keeps the raw XML around and returns any properties/attributes present when asked.  That way POI can "pass through" unknown content while allowing access to all known content in a file.  This is used quite often by downstream projects, especially ones that start with custom "template" files containing bits like OLE components, custom Office extensions, VBA macros, complex pivot tables, etc. and fill in live data on the fly.  

In the spec this is handled by referencing additional namespaces on the fly by URL. Any replacement in POI would need to allow similar retention of unknown XML elements and content.  This completely breaks systems like JAXB, which needs all possible namespaces and how they fit together to be defined up front.  There is no dynamic model building or class loading.
Comment 25 PJ Fanning 2017-12-15 09:24:38 UTC
I found some other projects based on POI, that have some usage of spreadsheetml classes. I raised issues in these projects.
Comment 26 Dominik Stadler 2017-12-24 11:41:29 UTC
*** Bug 61921 has been marked as a duplicate of this bug. ***
Comment 27 Dominik Stadler 2018-01-10 18:13:50 UTC
*** Bug 61988 has been marked as a duplicate of this bug. ***
Comment 28 PJ Fanning 2018-01-16 10:19:27 UTC
*** Bug 62004 has been marked as a duplicate of this bug. ***
Comment 29 Scott Coldwell 2018-04-13 16:04:31 UTC
I'm happy to help with the actual conversion when it comes time for that.  My company just went through a pretty large xmlbeans -> JAXB conversion involving many 3rd party schemas in various states of schema authoring best practices. It's really not that bad once you get the classes compiled properly and we've been through a bunch of the gotchas with making that happen.
Comment 30 Andreas Beeker 2018-04-13 19:24:56 UTC
I have some ideas of how JAXB could be integrated into POI, but currently even the basics don't work.

If you want to have something to get started with, you could try to solve [1]

This would be only one technical POC. We would need to support also older ECMA versions, which are not downward compatible.

I thought about having only the xml fragment attached to the usermodel and parse it on the fly and use the jaxb binder to preserve the xml infoset.

The current issue is about a new xmlbeans version - so we should discuss a JAXB related solution in a different bugzilla entry.

[1] https://stackoverflow.com/questions/46869482
Comment 31 Andreas Beeker 2018-06-22 21:26:49 UTC
Applied PJ Fannings pull request [1] via r1834165
[1] https://github.com/apache/poi/pull/113

Leaving it open until XmlBeans 3.0.0 is released