Bug 57882 - allow appending to large workbooks without reading
Summary: allow appending to large workbooks without reading
Status: NEEDINFO
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.10-FINAL
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-01 17:39 UTC by geoff.groos
Modified: 2016-09-11 00:50 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description geoff.groos 2015-05-01 17:39:43 UTC
As a developer and POI user I want to be able to append sheets to an existing very large workbook as quickly as possible so that I can allow my application users to update existing workbooks easily

Specifically I am looking for the ability to point POI at a very large workbook and say 'add these sheets to it' without having to deal with any of the already existing content in that workbook. If you would humor a little bit of feature creep and speculative development: I suspect many users would appreciate a mode whereby they can append rows to an existing sheet.

Currently the only method (given a small heap space) to do this would be to use the XSSF workbook with a SAX driver and lazy reading scheme, streaming its content to 'the front' of an SXSSF workbook, such that all the contents of the existing workbook can be put into the SXSSF instances file buffer. Once that (long running) task is finished, I can append any new data to the back of that SXSSF workbook.

Worth noting, that while I'm pretty sure this strategy would work, after a 20 minute implementation I haven't got it working: creating my XSSF workbook with the package object in READ mode appears to do some amount of fairly aggressive prefetching, which is causing heap space errors.
Comment 1 Nick Burch 2015-05-01 17:47:21 UTC
If you just want to add sheets, then as long as those sheets have no external references, it should be fairly easy. Open the destination OPCPackage, and read the workbook stream in via xmlbeans (it's small). Open the OPCPackage holding your sheets to add, create a part in the destination, then copy the contents over of the sheet part. Add the relation for the new sheet, and use xmlbeans to add a reference to the new sheet with that relation. Save!

I'm not sure if this counts as in-scope for SXSSF or not, but I could certainly see value in writing it up as something for the examples directory for now at least. If you do get it working, please contribute it back as such!
Comment 2 geoff.groos 2015-05-01 18:14:22 UTC
I've never worked with the OPCP package so I'm only understanding your suggestion vaguely, but I'll see what I can do and if I come up with something elegant I'll get it back to you guys.

Right now the issue that would implement this feature on my side is blocked by another feature request I've got with JavaFX (https://javafx-jira.kenai.com/browse/RT-40703), so I'm not going to make any forward progress on this for some time.

In the mean time you I'd be happy to turn this issue into a 'add documentation on how to do appending sheets to large workbooks' and leave it open until I get around to implementing it, or simply closing it as WONT-FIX.
Comment 3 Javen O'Neal 2016-06-15 11:01:47 UTC
Any updates on the JavaFX bug or suggested documentation (or example code https://svn.apache.org/viewvc/poi/trunk/src/examples/)
Comment 4 Dominik Stadler 2016-08-18 14:20:38 UTC
Keeping this on needinfo based on previous comment.
Comment 5 Javen O'Neal 2016-09-11 00:50:48 UTC
A more general solution to the memory problem is to lazy load sheets and allow sheets to be swapped between disk and memory. I'm guessing this is how Microsoft Excel keeps its memory footprint from exploding on workbooks with thousands of sheets.