Bug 62355

Summary: Unsplit packages to conform Jigsaw (Java 9) module standard
Product: POI Reporter: Andreas Beeker <kiwiwings>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 4.0.x-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 60427, 62151    
Attachments: Files with more than just import changes
Files with only changed imports
Open up HPSF classes, because package injection doesn't work in Jigsaw

Description Andreas Beeker 2018-05-06 09:18:36 UTC
Currently it's not possible to include POI as an automatic module [3], i.e. users can't use module-info to depend on POI [1][2].

Although a real module would be nice, we should at least unsplit the packages.

I'm currently working on it and would like to provide it in 4.0.0.
Of course there will be breaking changes, but mostly in some base classes, which I guess, users code would hardly use.


[1] http://poi.apache.org/faq.html#faq-N102B0
[2] https://stackoverflow.com/questions/44463552
[3] http://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html
Comment 1 Andreas Beeker 2018-05-07 21:42:58 UTC
Find attached three patches to be reviewed.
I've split them up, so reviewing is easier - i.e. basically you need to look at 
the *changes*-patch.

I had to find a few compromises, especially with using OOXML classes in Main 
packages.
Furthermore I've unified the extractor classes to stay in "extractor" or 
"ooxml/extractor"
and opened the HPSF classes, as it's not possible to inject a class in a 
foreign package namespace anymore.

I'm using an non-committed version of the POI-Visualizer to test the changes, 
which is now converted (mostly) to Java9.
In case you want to test the Java 9 automatic module, I can provide you with a 
branch.
Comment 2 Andreas Beeker 2018-05-07 21:43:01 UTC
Created attachment 35916 [details]
Files with more than just import changes
Comment 3 Andreas Beeker 2018-05-07 21:43:05 UTC
Created attachment 35917 [details]
Files with only changed imports
Comment 4 Andreas Beeker 2018-05-07 21:43:08 UTC
Created attachment 35918 [details]
Open up HPSF classes, because package injection doesn't work in Jigsaw
Comment 5 Nick Burch 2018-05-14 15:10:14 UTC
This does seem to have a lot of breaking changes for users, for something which I'm not sure many people are using (and which has had a very mixed response at best), and which either needs us to move everything to Java 9, or potentially breaks loads of tools including Android ones 

What if we just produced one additional jar, poi-all-java9, which has all the classes from the different jars "smooshed" in together, along with a proper module-info file? We'd then just need to make a smaller number of changes for things like the package injection (and any other Java 9/10 fixes, if there are any?), but would be able to properly detail what we export / what we use / etc. 

Given how many people we see on the mailing list and on stackoverflow using very old versions of POI, just the move from 3.x to 4.x is going to put off some people upgrading, and a whole bunch of broken imports is probably going to mean loads of users don't upgrade. 

If we can avoid lots of breaking changes, but let the handful of people for whom Java 9 modules actually works properly be able to use POI there with just a java-9-specific jar, I'd think that'd be a good compromise. Though I'm not sure I have much time to help just now either way...
Comment 6 Mark Murphy 2018-05-14 15:28:30 UTC
(In reply to Nick Burch from comment #5)
> This does seem to have a lot of breaking changes for users, for something
> which I'm not sure many people are using (and which has had a very mixed
> response at best), and which either needs us to move everything to Java 9,
> or potentially breaks loads of tools including Android ones 
> 
> What if we just produced one additional jar, poi-all-java9, which has all
> the classes from the different jars "smooshed" in together, along with a
> proper module-info file? We'd then just need to make a smaller number of
> changes for things like the package injection (and any other Java 9/10
> fixes, if there are any?), but would be able to properly detail what we
> export / what we use / etc. 
> 
> Given how many people we see on the mailing list and on stackoverflow using
> very old versions of POI, just the move from 3.x to 4.x is going to put off
> some people upgrading, and a whole bunch of broken imports is probably going
> to mean loads of users don't upgrade. 
> 
> If we can avoid lots of breaking changes, but let the handful of people for
> whom Java 9 modules actually works properly be able to use POI there with
> just a java-9-specific jar, I'd think that'd be a good compromise. Though
> I'm not sure I have much time to help just now either way...

If we provide a single jar for Java 9 in 4.0, how will that affect our ability to provide a proper module in the future? Will people try to use that as a non-module Java 9 solution because there are less jars to manage? Could we somehow provide both a proper automatic module jar, and the typical set of jars for Java 8 from the same code base. I have to admit, I have not looked at the Jigsaw requirements and restrictions, so I don't know if these are even good questions. But it feels to me like a stopgap measure could set us up for more work down the road.
Comment 7 Nick Burch 2018-05-14 15:34:03 UTC
I'm proposing we offer a single "proper" module jar for people who want to use Java 9 modules, whilst leaving things unchanged for everyone else. I'm not sure there's many (any?) people wanting Java 9 Modules who won't also want the newer file formats. Uses could then either carry on as they do now and stick with the classpath, or swap to the j9 single jar if they wanted to use the modulepath
Comment 8 Andreas Beeker 2018-05-15 20:39:58 UTC
TL;DR: thank you for the comments; why we should prepare for Java 9 usage; why I think the one-big-jar approach is not good


First of all I want to thank you for your comments. Although this kind of discussion is recuring over time, I think it is often necessary to gather the current opinions and not anticipate what everyone could think. Think of it, as a kind of test balloon like we had it with Java 8.


(In reply to Nick Burch from comment #5)
> This does seem to have a lot of breaking changes for users...

As I have the impression, that you might be overwhelmed by the amount of changed classes, I'd like to give you a short overview of what has changed.

I've tried to minimize the effects, so it's mainly an organize imports in user code - which the usual IDE is able to do, which we can explain in therelease notes and which might be not so suprising for a new major version

If you look at the package moves there are the following notable changes:
- extractor and converter classes:
  - There were some cross references between hwpf and hssf -> now in common
  - Ooxml, Main, Scratchpad related extractors were in the same package.
    In other places we separate those - why should we put all of them in the same "pot"?
- POIXML* classes:
  search for "org.apache.poi.POIXML" replace with "org.apache.poi.ooxml.POIXML" ... a no-brainer


> ... for something, which I'm not sure many people are using (and which has had a very mixed
> response at best)

... but more and more will use, and as soon as you added a module-info.java to the Java 9 user module, my IDE doesn't compile anymore because of the split packages.

So we effectively block developers to use the module-info.
I don't understand why we wouldn't clean-up the packages on one hand, but on the other hand try to stay compatible with Java 10+, i.e. the commons compress change.
If the user code is also a library and even wouldn't use module-info, the add-opens need to be passed to their users.


> and which either needs us to move everything to Java 9, or potentially breaks loads of tools including Android ones

With automatic modules we don't need to move everything now. If we want to stay (?) or become Android compatible, we need to check anyway for the feature set (https://developer.android.com/studio/write/java8-support)


> What if we just produced one additional jar, poi-all-java9 ... ?

If we (would) have the three options ...
- keep the packaging
- have format specific modules
- "the-one-to-rule-them-all" :)

... is this really the prefered option for the future?

I'd like to aim for format specific modules + extras and try to minimize or straighten the package dependencies in future releases - as far as I understand, this is also the goal of Jigsaw, i.e. to have a stronger encapsulation and better maintainability. Providing the one big jar is a step back for me - so for the time being, I would rather keep the packaging than introducing another artifact. So last time we argued about format-specific modules, one argument was, that it is already so confusing to have main, scratchpad, poi-ooxml, poi-ooxml-schemas. So I would say, introducing an additional package doesn't make it better.



> We'd then just need to make a smaller number of changes for ... Java 9/10 fixes ...

I think we probably don't go the way of multi release jars, so as far as the changes are compatible with the choosen Java version, we will be ok ... otherwise we need to decide about upgrading.


> Given how many people we see on the mailing list and on stackoverflow using
> very old versions of POI, just the move from 3.x to 4.x is going to put off
> some people upgrading, and a whole bunch of broken imports is probably going to
> mean loads of users don't upgrade.

Actually my hope is, the 4 in front might wake up a few of those lingering on 3.9. As we saw on the stackoverflow posts, mostly the 3.9 users don't even try to use a newer version first. They don't complain about incompatible changes ... they simply don't try it ...
If I compare the problems of fixing the imports vs. the inability to use POI in jigsaw user project, I rather would like to help the second group.


(In reply to Mark Murphy from comment #6)
> If we provide a single jar for Java 9 in 4.0, how will that affect our ability to provide a proper module in the future?

I guess we can change that from major to major version. But there was a reason for jigsaw as mentioned above, and I don't want to work against it.
Comment 9 Mark Murphy 2018-05-16 13:14:37 UTC
Andreas, I think we are on the same page WRT automatic modules. I think that the best way forward is a repackaging by document type (WP, SS, PP) with a core module and one or more support modules for things like charts, drawings, and equations. If I understand things correctly, the module-info piece works as a dependency manager for jigsaw modules. But if we go all in on that, would it interfere with Java 8 compatibility?

Automatic modules provide an intriguing attempt at a workaround to real modules, but it appears only half baked for us and other open source projects [1]. I suggest that we should not make the move to modules yet because none of our dependencies have made the switch. And we should only move POI to modules when all of our dependencies are modularized, or the issues discussed in the afore mentioned blog are addressed. I think that any other strategy, or attempt to push forward regardless of dependencies will cause much more grief down the road than the few who want modules now.

I do believe that modules are the way to go, but we must be intentional about it if we are to avoid unintended consequences. I have seen this in Javascript applications where some open source libraries (jQuery) shift suddenly to Require.js, and now I can't properly upgrade a site because there are too many dependencies on libraries that don't support Require.js. There may be a way to do it, but I am not sufficiently skilled, and I don't have time to go through all the dependencies to correct all the necessary libraries. So I am stuck with an old unsupported version of jQuery, and an unsupported version of jqGrid, and any questions are met with "you are on a very old version of jqGrid, try upgrading first".

Unless we want to leave folks in the lurch, or we want to support very old version of POI, we need to be cautious and intentional about how we approach modularization. Maybe wait for v5.0 or even v6.0 to do that.

[1] http://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html
Comment 10 Andreas Beeker 2018-05-16 19:28:14 UTC
Maybe I should I explain a bit shorter, to make my self clear:

- for POI 4.0.0 I want to change the java-package of around 10 classes and leave
  the maven-packaging as is. This is sufficient to make module-info work in user code ...
  at least I didn't get any compile/execution errors

- for POI 5 *), I want a repackaging by document type

- for POI 6 *), I want to upgrade to real modules / Java 9
  and we also need to fix XmlBeans then

*) just a rough plan, but in that order


> If I understand things correctly, the module-info piece works as a
> dependency manager for jigsaw modules. But if we go all in on that,
> would it interfere with Java 8 compatibility?

I guess so -> https://github.com/google/guava/issues/2970



> I suggest that we should not make the move to modules yet because none of
> our dependencies have made the switch. ...

Regarding the modules, I'd go in the steps above


> And we should only move POI to modules when all of our dependencies are modularized ...

ACK, but we might ask them about their Jigsaw plans and about adding the
Automatic-Module-Name manifest entry




> I do believe that modules are the way to go, but we must be intentional about
> it if we are to avoid unintended consequences.

If we go in two steps, we don't have two building sites at once ...
For the time being, I don't think that adding an Automatic-Module-Name entry
would make sense for us.



> ... and any questions are met with "you are on a very old version of jqGrid,
> try upgrading first".

ha ha ... I like that twist, i.e. the implicit comparision to our typical SO answers.
Looking at our dependencies, I don't think commons-codec/collection would
upgrade before us, but currently I don't think that's a problem
Comment 11 Andreas Beeker 2018-05-27 22:21:47 UTC
Add the patches via:
- r1832361 | unsplit packages - 4 - open HPSF
- r1832360 | unsplit packages - 3 - only imports
- r1832359 | unsplit packages - 2 - modified classes (not only imports)
- r1832358 | unsplit packages - 1 - moved classes

I'm leaving the issue open, in case further modifications are necessary until 4.0.0 is released
Comment 12 Dominik Stadler 2018-12-22 17:41:37 UTC
Seems to be fixed, no more changes seem necessary here for now.