Bug 63100 - Streaming data for browsers
Summary: Streaming data for browsers
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.17-FINAL
Hardware: PC Mac OS X 10.1
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-22 20:35 UTC by Matija Obreza
Modified: 2021-10-08 18:36 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matija Obreza 2019-01-22 20:35:24 UTC
SXSSF works as designed and manages a small memory footprint when generating large files from a database.  But it only writes data to an output stream once everything has been written to SXSSF. This is problematic when used in web applications:

In our use case (our website allows users to generate Excel from the database), generating the SXSSF on the server takes about 5 minutes. Most clients give up within a minute (or the browser does it automatically), or the proxy times out due to no data being sent. Some users also retry the download request. A new request for download is initiated (while the server is busy generating the SXSSF for a client that already gave up). This can potentially lead to DOS.

To work around this issue, I've implemented a super-streaming version of SXSSF, a `SuperSXSSF`, that relies on `rowWriter` callback to generate row data.

With this approach our service is able to stream the generated Excel directly to the client and, best of all, is terminated in case the user cancels the download request. 

The `SuperSXSSF` prevents both download timeouts and potential DOS, while allowing developers all other XSSF actions (i.e. define styles) that don't take much processing time.

Now what?



Modifications at: https://gitlab.croptrust.org/genesys-pgr/genesys-server/tree/master/src/main/java/org/apache/poi/xssf/streaming

Use case: https://gitlab.croptrust.org/genesys-pgr/genesys-server/blob/master/src/main/java/org/genesys2/server/service/impl/DownloadServiceImpl.java
Comment 1 PJ Fanning 2019-01-22 22:22:08 UTC
Your project seems useful. Could you open source your own jar (ie publish it maven central)?

We can link to your page from our https://poi.apache.org/related-projects.html
Comment 2 Matija Obreza 2019-01-22 22:54:03 UTC
(In reply to PJ Fanning from comment #1)
> Your project seems useful. Could you open source your own jar (ie publish it
> maven central)?
> 
> We can link to your page from our
> https://poi.apache.org/related-projects.html

The changes are implemented directly in the project https://gitlab.croptrust.org/genesys-pgr/genesys-server (Apache v2 licensed) because it is much simpler to trick the Java classloader to use our `.class` files vs making a whole new jar just for the few updates I needed.

Changes to original SXSSF code are at https://gitlab.croptrust.org/genesys-pgr/genesys-server/commits/master/src/main/java/org/apache/poi/xssf/streaming, specifically at https://gitlab.croptrust.org/genesys-pgr/genesys-server/commit/bac27c01a997ff8cfc4352018639e685712f3136

I've been on git for long, how do I make a merge request to your code?
Comment 3 Matija Obreza 2019-01-22 23:02:17 UTC
Our maven artifacts are in Central http://central.maven.org/maven2/org/genesys-pgr/
Comment 4 PJ Fanning 2019-01-22 23:45:54 UTC
You could fork https://github.com/apache/poi and submit a pull request there.
Comment 5 Matija Obreza 2019-01-23 02:40:03 UTC
https://github.com/apache/poi/pull/141
Comment 6 PJ Fanning 2021-10-08 18:36:42 UTC
DeferredGeneration example in poi-examples is based on this