Bug 66644 - Make Apache POI binaries reproducible
Summary: Make Apache POI binaries reproducible
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-12 08:51 UTC by Dominik Stadler
Modified: 2023-07-03 21:22 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Stadler 2023-06-12 08:51:20 UTC
There are various efforts underway to make binaries of open source software reproducible.

For Java, there is https://github.com/jvm-repo-rebuild/reproducible-central with tools and procedures to check if Java libraries are reproducible and to see if there are changes necessary to make it reproducible.

Let's do a first run with current Apache POI binaries and the tooling to see where we include non-reproducible content.

At least the following might pop up:
* Version number/build-date injected into sources at build time
* generated JavaDoc?
* Generated ooxml-sources

See the following resources for getting started:
* https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/TOOLS.md
* https://reproducible-builds.org/docs/jvm/#configuring-build-tools-for-reproducible-builds
*
Comment 1 PJ Fanning 2023-06-12 08:57:42 UTC
In Gradle, we have https://docs.gradle.org/current/userguide/working_with_files.html#sec:reproducible_archives

Enabling the configs seems relatively easy. Then, we will need to see if the output jars are reproducible and work through the issues.

tasks.withType(AbstractArchiveTask).configureEach {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}
Comment 2 PJ Fanning 2023-06-12 10:12:40 UTC
I added r1910364

That is just the tip of the iceberg because we will now need to go through all the jars and see what happens after clean builds - if the jars are the same or if they are different and what are the differences.

With our slow and complicated build, that is going to be time consuming and quite tedious.
Comment 3 Dominik Stadler 2023-06-24 09:59:55 UTC
FYI, a simple re-build did not show any difference when using diffoscope:


for i in `find build -name *.jar`;do echo $i `basename $i`;diffoscope --progress --text -  $i /tmp/`basename $i`;done
build/dist/maven/poi-excelant/poi-excelant-5.2.4-SNAPSHOT.jar poi-excelant-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-ooxml-lite-agent/poi-ooxml-lite-agent-5.2.4-SNAPSHOT.jar poi-ooxml-lite-agent-5.2.4-SNAPSHOT.jar
build/dist/maven/poi/poi-5.2.4-SNAPSHOT.jar poi-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-examples/poi-examples-5.2.4-SNAPSHOT.jar poi-examples-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-integration/poi-integration-5.2.4-SNAPSHOT.jar poi-integration-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-ooxml/poi-ooxml-5.2.4-SNAPSHOT.jar poi-ooxml-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-ooxml-full/poi-ooxml-full-5.2.4-SNAPSHOT-sources.jar poi-ooxml-full-5.2.4-SNAPSHOT-sources.jar
build/dist/maven/poi-ooxml-full/poi-ooxml-full-5.2.4-SNAPSHOT.jar poi-ooxml-full-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-scratchpad/poi-scratchpad-5.2.4-SNAPSHOT.jar poi-scratchpad-5.2.4-SNAPSHOT.jar
build/dist/maven/poi-ooxml-lite/poi-ooxml-lite-5.2.4-SNAPSHOT.jar poi-ooxml-lite-5.2.4-SNAPSHOT.jar

As we control the version of Java that is used even using Java 17 did not result in any change in the binaries.

Remaining issues that I can think of:

1)
There will be a change when the build is run on a different date because of this in poi/build.gradle
*         content = content.replace("@DSTAMP@", new Date().format('yyyyMMdd'))
Comment 4 PJ Fanning 2023-06-24 10:08:05 UTC
We can live without that DSTAMP in the Version.java template.

With xmlbeans, I created a new buildinfo file that gets published separately to Maven. See last June 17-19 commits in https://github.com/apache/xmlbeans/commits/trunk.

I'm reluctant to do the same in POI because of the multiple modules.
Comment 5 PJ Fanning 2023-07-03 21:22:27 UTC
added r1910760