Bug 60153 - [PATCH] Use ZipEntrySource in SXSSF module
Summary: [PATCH] Use ZipEntrySource in SXSSF module
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.14-FINAL
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2016-09-20 12:40 UTC by PJ Fanning
Modified: 2016-10-10 12:58 UTC (History)
0 users

[PATCH] open up SXSSF classes so that they can be subclassed (25.12 KB, patch)
2016-09-20 23:23 UTC, PJ Fanning
Details | Diff
[PATCH] open up SXSSF classes so that they can be subclassed (27.42 KB, patch)
2016-09-21 09:30 UTC, PJ Fanning
Details | Diff
Extra test cases for encrypted temp data (10.60 KB, patch)
2016-10-09 10:32 UTC, PJ Fanning
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description PJ Fanning 2016-09-20 12:40:18 UTC
The current SXSSF implementation uses temp files and this is problematic for companies who want all their data at rest to be encrypted.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59841 added support for developers to use their own implementations of ZipEntrySource when reading inro XSSF workbook. I have my own subclass of this that encrypts the temp file data.
The ZipEntrySource interface is read only but it would be feasible to have support for a writer to create ZipEntrySources too (eg something like org.apache.poi.poifs.crypt.TestSecureTempZip).
org.apache.poi.xssf.streaming.SheetDataWriter exposes the temp file via its public API. If we were able to deprecate this and just provide Readers and Writers, then this could also be subclassed to have the data in the temp file encrypted. We could also allow a pattern where the readers and writers could be decorated with encryptors/decryptors.
Comment 1 PJ Fanning 2016-09-20 21:47:30 UTC
I have a sample of what I'd like to change in https://github.com/pjfanning/poi/commit/72f0ada28f17b0079d296fa6dbc98d3b76b058a3

I can add test cases to demonstrate the ability to plug in custom temp file implementations to override the use of simple temp files.
By subclassing this version of SXSSFWorkbook, developers can replace the temp file behaviour in write(OutputStream stream) with a custom ZipEntrySource (see org.apache.poi.poifs.crypt.TestSecureTempZip) before calling injectData.
createSheetDataWriter() can be overriden to return a subclass of SheetDataWriter that overrides the decorateInputStream snd decorateOutputStream methods I've added.
Comment 2 PJ Fanning 2016-09-20 23:23:57 UTC
Created attachment 34284 [details]
[PATCH] open up SXSSF classes so that they can be subclassed
Comment 3 Javen O'Neal 2016-09-21 03:42:44 UTC
The patch from comment 2 looks good to me, including the relevant unit tests :) I committed SXSSFWorkbook#flushSheets() to get the ball rolling in r1761668.

Can someone else review this patch to make sure this implementation doesn't write unencrypted data to disk and is unlikely to be the source of a security vulnerability?
Comment 4 PJ Fanning 2016-09-21 09:30:26 UTC
Created attachment 34286 [details]
[PATCH] open up SXSSF classes so that they can be subclassed

updated patch based on Javen's commit and tidied up some issues
Comment 5 Javen O'Neal 2016-10-09 04:51:35 UTC
I am assuming a lazy consensus that there are no security issues with this implementation. Most of the crypto code has been around in TestSecureTempZip for a while anyway. This patch just makes that code available to developers by moving it into the main library.

I applied your patch from comment 4 with a one minor change in r1763943.
1. provide protected or public accessor methods rather than elevating visibility of private variables. This gives us more freedom in the future to consolidate code between HSSF, XSSF, and SXSSF.

Remaining questions:

1. In AesZipFileZipEntrySource, should the close method do nothing if the object has already been closed (guard the code with an if (closed)?
2. TestSecureTempZip demonstrates how to read an AES-encrypted XSSFWorkbook and TestSXSSFWorkbookWithCustomZipEntrySource demonstrates how to write an SXSSFWorkbook using encrypted temporary files, but we don't have an example for writing an SXSSFWorkbook where both temporary files and the saved workbook are AES-encrypted. Would you be willing to provide a code example or unit test for this?
Comment 6 Javen O'Neal 2016-10-09 04:54:19 UTC
3. Can you write a unit test that demonstrates that temporary files created by SXSSFWorkbook are encrypted?
Comment 7 PJ Fanning 2016-10-09 08:27:02 UTC
(In reply to Javen O'Neal from comment #6)
> 3. Can you write a unit test that demonstrates that temporary files created
> by SXSSFWorkbook are encrypted?

Thanks Javen. I can look at the 3 topics you highlighted and I can attach a new patch file over the coming days.
Comment 8 PJ Fanning 2016-10-09 10:32:11 UTC
Created attachment 34346 [details]
Extra test cases for encrypted temp data
Comment 9 Javen O'Neal 2016-10-09 13:00:30 UTC
Thanks for the prompt patches! Applied in r1763969.

If you have any other unit tests you would like to add, please reopen this bug.
Comment 10 Javen O'Neal 2016-10-09 23:58:10 UTC
Would you be interested in writing a few sentences to add to the Encryption documentation https://poi.apache.org/encryption.html

The documentation source lives here: https://svn.apache.org/viewvc/poi/site/src/documentation/content/xdocs/encryption.xml?view=log
Comment 11 PJ Fanning 2016-10-10 12:58:00 UTC
Hi Javen,
I can look at putting together some doc and related sample code.
Could take a few days.