Summary: | Work on providing an updated version of XMLBeans | ||
---|---|---|---|
Product: | POI | Reporter: | Dominik Stadler <dominik.stadler> |
Component: | POI Overall | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | 578372374, ak.azad, david, ijuan.rm, kurthuwig, l_alexandra2010, prabinshr007123, sakurotawa |
Priority: | P2 | ||
Version: | 3.15-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows NT | ||
Bug Depends on: | 61949 | ||
Bug Blocks: | 54084, 55149, 58247, 58925, 59195, 59428, 61494 |
Description
Dominik Stadler
2016-04-04 07:27:30 UTC
Comment from Nick in bug #59195: "Maybe we could get together with Tika and Lucene, and ask the Attic PMC to let us put together a 2.6.1 release with the packaging fix and some others, if someone fancies spending a little bit of time to lead that?" Bonus points for looking at the following locking issue as well: http://apache-poi.1045710.n5.nabble.com/Alternative-Replacement-for-xmlbeans-tt5722053.html Another thing that we can look at: bug 55149 - "Usage of XmlBeans triggers "clearThreadLocalMap" warnings" there are a few commits on trunk after 2.6.0 that might be interesting here: * https://github.com/apache/xmlbeans/commit/4e918d97a03f6aa4355a4580933dca9bf23e80a0 * https://github.com/apache/xmlbeans/commit/0d66e61968149809030bce123ea7a0ef58bccb83 * https://github.com/apache/xmlbeans/commit/511d8ae119a7a0cc0d822e5997df1ffad3fe7677 * https://github.com/apache/xmlbeans/commit/6cd06c9c9e0174190b593de7dec372d5cc123305 The javax... classes are actually included via the stax-api dependency of XMLBeans. This one should probably simply be removed together with depending on Java 6 or higher. *** Bug 59428 has been marked as a duplicate of this bug. *** *** Bug 59195 has been marked as a duplicate of this bug. *** We also see some bugs related to Unicode handling inside XmlBeans, these are also candidates for this effort, see e.g. bug 54084. I had a little trouble building xmlbeans so I made some small changes and committed them to https://github.com/pjfanning/xmlbeans. I made a change on a fork that I have of xmlbeans and it allows the testBug54084Unicode to pass (https://github.com/pjfanning/xmlbeans/commit/b4dda3837421835b7a378c4ab83ce48a4c49fe59). I have other branches on the xmlbeans fork where I go further and start removing Piccolo parser, the JSR173 classes and the deprecated code supporting XMLInputStream). These changes break 2 CDATA xmlbeans unit tests but this doesn't seem to adversely affect the poi usage. They can probably be fixed without too much effort. The removal of the Piccolo parser would allow some tidy up in POI code (as it has some workarounds to avoid using Piccolo). Do we think it would be possible to issue an xmlbeans patch based on the changes I've made in github? The longer term solution is probably to replace xmlbeans. I have published a patched version of xmlbeans to Maven Central. com.github.pjfanning:xmlbeans:2.6.1 I have a sample that shows the patched jar in action at https://github.com/pjfanning/poi-xmlbeans-patch-test It'd be good to go through the xmlbeans bugzilla, and see if there are any other bug fix patches there that'd be worth including in a fix release. (Not to mention the duplicated classes in the 2.6.0 binary jar that ought to be solved by a fresh build!) My jar has the https://issues.apache.org/jira/browse/XMLBEANS-499 issue with duplicate ReferenceResolver classes. I'll see if I can work out what in the xmlbeans build is causing this. I added a fix for the duplicate classes in https://github.com/pjfanning/xmlbeans - I will publish a 2.6.2 jar in the coming days if there are no other xmlbeans issues to fix. Feel free to highlight any issues from XMLBeans in the issue tracker for the github fork. https://issues.apache.org/jira/browse/XMLBEANS-513?jql=project%20%3D%20XMLBEANS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22) Thanks for the work, I have tested it a bit and so far it looks quite good. One thing we could try is to remove the stax-api dependency from the Maven deployment, I don't think this is still necessary with Java 6 or newer. As far as I see the other most pressing items are in place now and I would rather release a first piece instead of adding more aggressive stuff, we can always do this later in a second release. Should we start discussion on general@attic.apache.org on required next steps for publishing an official bugfix-release of xmlbeans? In fact with your release on Maven Central we could even technically switch to this without a release from the attic, but it would probably require some due diligence on licensing of the additional patches, ... I think best practice would be to speak with the Attic PMC about this I agree that getting an official apache xmlbeans patch done is preferable to having people use my forked jar. This jar solves some issues with writing XSSF workbooks that have unicode surrogate chars. I have found a related issue in the SXSSF code base. https://bz.apache.org/bugzilla/show_bug.cgi?id=61246 This means that for writing SXSSF workbooks, users affected by the unicode surrogate chars issues will need the fork xmlbeans jar and the latest POI code (or 3.17 beta2 when it is released). *** Bug 61494 has been marked as a duplicate of this bug. *** *** Bug 58247 has been marked as a duplicate of this bug. *** *** Bug 54084 has been marked as a duplicate of this bug. *** I think the more appropriate course of action would be to remove xmlbeans as a dependency. The project was terminated over 4 years ago. (In reply to Scott Coldwell from comment #22) > I think the more appropriate course of action would be to remove xmlbeans as > a dependency. The project was terminated over 4 years ago. Removing XMLBeans is probably something like 3-6 months of work, and would have the added disadvantage of probably breaking something like half of all the POI examples for XLSX / PPTX / DOCX on the internet, and probably most of the big projects which make deep/custom use of POI for XLSX/PPTX/DOCX processing.... The first step, for which volunteers are very very much welcome(!) is to locate all the places in your favourite bit of POI (eg XSSF) that leak the CT* xmlbeans objects, search public examples / github projects / mailing lists etc to see how those beans are used, ensure we have a proper Usermodel wrapper for doing the same thing "properly", then deprecate that CT accessor. Once everything that's commonly done via the low-level xmlbeans classes in other people's code can be done cleanly with proper POI classes, we can get people to migrate, then we're in a position to swap out the POI innards to use something else. It's quite a lot of work though, which is why more volunteers are needed to help drive it! :) Another complication is that none of the replacement technologies that have been discussed so far can handle the arbitrary XML allowed as "extensions" in the OOXML spec. One benefit of XMLBeans is that it keeps the raw XML around and returns any properties/attributes present when asked. That way POI can "pass through" unknown content while allowing access to all known content in a file. This is used quite often by downstream projects, especially ones that start with custom "template" files containing bits like OLE components, custom Office extensions, VBA macros, complex pivot tables, etc. and fill in live data on the fly. In the spec this is handled by referencing additional namespaces on the fly by URL. Any replacement in POI would need to allow similar retention of unknown XML elements and content. This completely breaks systems like JAXB, which needs all possible namespaces and how they fit together to be defined up front. There is no dynamic model building or class loading. I found some other projects based on POI, that have some usage of spreadsheetml classes. I raised issues in these projects. https://github.com/norbert-radyk/spoiwo/issues/28 https://github.com/monitorjbl/excel-streaming-reader/issues/125 *** Bug 61921 has been marked as a duplicate of this bug. *** *** Bug 61988 has been marked as a duplicate of this bug. *** *** Bug 62004 has been marked as a duplicate of this bug. *** I'm happy to help with the actual conversion when it comes time for that. My company just went through a pretty large xmlbeans -> JAXB conversion involving many 3rd party schemas in various states of schema authoring best practices. It's really not that bad once you get the classes compiled properly and we've been through a bunch of the gotchas with making that happen. I have some ideas of how JAXB could be integrated into POI, but currently even the basics don't work. If you want to have something to get started with, you could try to solve [1] This would be only one technical POC. We would need to support also older ECMA versions, which are not downward compatible. I thought about having only the xml fragment attached to the usermodel and parse it on the fly and use the jaxb binder to preserve the xml infoset. The current issue is about a new xmlbeans version - so we should discuss a JAXB related solution in a different bugzilla entry. [1] https://stackoverflow.com/questions/46869482 Applied PJ Fannings pull request [1] via r1834165 [1] https://github.com/apache/poi/pull/113 Leaving it open until XmlBeans 3.0.0 is released |